[darcs-users] My current work on darcs...
Ben Franksen
ben.franksen at online.de
Fri Sep 8 11:58:06 UTC 2017
I am currently working on an number of experimental and potentially
far-reaching changes.
The starting point was a refactor of Darcs.Patch.Match to avoid the
repetitive low-level recursion on the PatchSet type. I extracted the
recursion mechanics into a few general functions that I moved to
Darcs.Patch.Set, with the result that Darcs.Patch.Match no longer needs
to import the data constructors of PatchSet. Then I went through the
rest of the code base and finally reduced direct access to the PatchSet
data constructors to a few modules, mostly D.P.Set and D.P.Depends and
those where PatchSets are read and written (D.P.Bundle, D.R.Hashed).
This led me to the next thing I am working on, which is rebase. This may
seem unrelated at first, but you will see that it is not.
The rebase patch (that contains the suspended patches and the fixups) is
currently mixed in with normal patches, so it can and will be commuted
with them. This leads to a number of problems:
* Normal patches and the rebase patch need to be made compatible by
wrapping both in the WrappedNamed type. This is ugly and complicates the
type hierarchy for patches.
* We have to make sure the rebase patch always gets commuted back to the
end before we write changes to disk. This affects all commands that can
modify the recorded state.
* The standard patch commutation interfaces aren't aware of the fact
that there should only ever be a single rebase patch, so implementing
them leads to "impossible" cases.
There is also some fairly elaborated type machinery at work to
distinguish repos with and without a rebase in progress. I am not quite
sure this is necessary, though I do understand the motivation; more on
that below.
I had the idea that a better place to store the rebase patch is to make
it an optional member of the PatchSet data type. This has two
consequences: the rebase patch is no longer mixed with normal patches in
patch sequences; so WrappedNamed can be removed and PatchInfoAnd reverts
back to (hopefully) containing a Named patch. I experimented a while
with various representations and finally came up with
data PatchSet rt p wStart wY where
PatchSet :: RL (Tagged rt p) wStart wX
-> RL (PatchInfoAnd rt p) wX wY
-> MaybeRebase p wY
-> PatchSet rt p wStart wY
data MaybeRebase p wX where
NoRebase :: MaybeRebase p wX
YesRebase :: (PrimPatchBase p, FromPrim p, Effect p, Commute p)
=> FL (RebaseItem p) wX wY
-> MaybeRebase p wX
It would be possible to integrate this with the existing repo type
distinction by adding an rt parameter to MaybeRebase i.e.
data MaybeRebase rt p wX where
NoRebase :: MaybeRebase ('RepoType 'NoRebase) p wX
YesRebase :: (PrimPatchBase p, FromPrim p, Effect p, Commute p)
=> FL (RebaseItem p) wX wY
-> MaybeRebase ('RepoType 'IsRebase) p wX
But I would rather get rid of RepoType and the rt type parameter because
I think that this kind of type distinction is not useful enough in
practice to justify the cost in complexity.
MaybeRebase is rather similar in structure to WrappedNamed: It hides the
right witness behind an existential, expressing that it "dangles off the
main repo"; and it also packs the constraints with the rebase patch that
are needed to adapt the rebase patch when (normal) patches are removed
from or added to the the end of the patch set (using slightly modified
versions of addFixupsToSuspended and removeFixupsFromSuspended).
Since access to PatchSet internals is no longer spread all over the
place (see the beginning of this message), it is now feasible to adapt
the remaining functions that do access them so as to add/remove the
necessary fixups to the rebase patch. This work led me consider how
patches are writen and read to/from disk. So I studied
Darcs.Repository.Hashed. It exports a very large number of functions
that operate on the disk representation of repositories, mostly
inventories (storage of the patches themselves is handled by the Cache
subsystem; there are also the pending patch and the unrevert bundle). I
looked at the implementation in detail and discovered the following
interesting fact:
The functions that remove patches from a repo/inventory are in no way
more efficient than writing out a complete (modified) PatchSet as a new
tentative state, but in fact rather less so. They all end up calling
removeFromTentativeInventory, which is, more or less,
readTentativeRepo >>= removeFromPatchSet to_remove >>=
writeTentativeInventory
And removeFromPatchSet will re-do all the commutations that the caller
has already done (in order to provide the to_remove FL of patches). Same
goes for functions that replace patches since they first remove and then
add them. It makes sense therefore to re-write commands to no longer use
the remove and replace functionality and instead directly calculate new
PatchSets and then write them to the tentative inventory. The latter is
already optimized to avoid writing inventories (which correspond to
Tagged sections of the PatchSet) that were not touched during
commutation, since these still have their hash attached. This will
considerably cut down on the number functions exported by D.R.Hashed. It
will also reduce exposure to infelicities in the patch theory for
RepoPatchV2 (whether commutation of Duplicate patches fails or succeeds
may depend on the order in which the commutation is performed). This can
and should be done before adding separate read and write operations for
the rebase patch.
(BTW, adding patches (at the end) really just appends them to an
existing inventory, so we should keep these functions.)
Back to rebase. Scrapping WrappedNamed will make things a lot simpler. I
would also like to get rid of the 'rt' (for RepoType) type parameter
together with all the type level distinctions between repos with and
without a rebase in progress. This is certainly debatable and Ganesh may
have a different opinion. My reasoning is as follows: assuming the
rebase patch is no longer mixed in with normal patches, and instead
reading and writing of inventories will handle the rebase patch
separately (like e.g. the pending patch is handled separately now), the
extra wrapping of repo jobs in order to move the rebase patch to the end
is also no longer necessary. The latter was, I think, the main
motivation for the repo type distinction. Without it, the benefits
(extra type safety) are not great enough to justify the cost (extra
complexity). Similar levels of safety can be had with more traditional
methods: modularity and pattern matching e.g. on MaybeRebase.
Apropos modularity. I think this is one of the major weaknesses of the
current code base of darcs. We have almost no type abstraction: the
modules usually export all data constructors and we use pattern matching
and low-level recursion all over the place. That makes it almost
impossible to guarantee non-trivial invariants, unless they are
expressed directly in the types. This in turn motivates the introduction
of more and more advanced type shenanigans. I guess one reason darcs
took this direction is that in the past it allowed quick and dirty
addition of new features without much thought about coherent (internal)
API design. (I am /not/ referring to rebase here, the design of which I
like pretty much, except for the way it is integrated with the rest of
darcs).
And a last point about typing: I have come to the conclusion that it
doesn't make much sense to distinguish between recorded and tentative
state at the type level (the wR and wT type parameters for the
Repository type). Here is a short description of the transaction
protocol: when changing the repository state, first copy the recorded
state to tentative; make changes only there; if all goes well and the
user doesn't abort, finalize by moving tentative back to recorded,
otherwise delete tentative and restore the original recorded state. As
far as I know, all commands except rebase run in one single such
transaction. The RepoJob machinery takes care of starting the
transaction, so the commands only ever see the tentative state, so how
could they ever accidentally mix recorded and tentative? The exception
for rebase is when patches are suspended in a rebase pull or rebase
apply. Here we commit before suspending patches for reasons that aren't
quite clear to me (a comment in the code suggests that it is because
readUnrecorded lies about the state it reads). I am pretty sure that
with a bit of refactoring this hole can be closed at which point I would
like get rid of the wT type parameter and instead move finalization
(commit or abort) to the RepoJob, using the standard Haskell tools
(finally, bracket, etc) to make sure commands never touch the recorded
state except through a transaction.
Cheers
Ben
More information about the darcs-users
mailing list