[darcs-users] My current work on darcs...
Guillaume Hoffmann
guillaumh at gmail.com
Sun Sep 10 17:58:09 UTC 2017
Hi Ben,
thanks for walking us through another (set of) aspect of Darcs'
codebase. I am so glad someone is taking the sufficient amount of time
to look into this in a global way!
Among the things you have listed, almost everything sound sensible to
me. For instance I knew about removeFromPatchSet but never took the
time to try and solve it. I am less aware of the implementation
details of rebase. But indeed it seems weird that rebase be stored as
a normal patch, as opposed to a special patch like pending or
unrevert, since they are all relevant only for the local user of a
repository.
Now, I recall that Ganesh had a branch that implemented a stash
command, that modified the implementation of rebase too, but he never
ended up sharing it. I recall that it had something to do with the rt
type parameter you mentioned. It would be nice to have his opinion on
your proposal indeed :)
Maybe you could also prepare a first patch for the refactor of
Darcs.Patch.Match you mention at the beginning of you mail, even if
parts of it would later be changed again if we decide to refactor
rebase as you propose.
Also, let us not forget your current patch bundles, especially
http://bugs.darcs.net/patch1593 (separating display and storage of
patches)!
I am telling this because, we should start preparing a new release of
Darcs, since GHC 8.2 has been out for a while now. Maybe by next month
we can have the codebase in an releasable state.
Guillaume
2017-09-08 8:58 GMT-03:00 Ben Franksen <ben.franksen at online.de>:
> I am currently working on an number of experimental and potentially
> far-reaching changes.
>
> The starting point was a refactor of Darcs.Patch.Match to avoid the
> repetitive low-level recursion on the PatchSet type. I extracted the
> recursion mechanics into a few general functions that I moved to
> Darcs.Patch.Set, with the result that Darcs.Patch.Match no longer needs
> to import the data constructors of PatchSet. Then I went through the
> rest of the code base and finally reduced direct access to the PatchSet
> data constructors to a few modules, mostly D.P.Set and D.P.Depends and
> those where PatchSets are read and written (D.P.Bundle, D.R.Hashed).
>
> This led me to the next thing I am working on, which is rebase. This may
> seem unrelated at first, but you will see that it is not.
>
> The rebase patch (that contains the suspended patches and the fixups) is
> currently mixed in with normal patches, so it can and will be commuted
> with them. This leads to a number of problems:
>
> * Normal patches and the rebase patch need to be made compatible by
> wrapping both in the WrappedNamed type. This is ugly and complicates the
> type hierarchy for patches.
>
> * We have to make sure the rebase patch always gets commuted back to the
> end before we write changes to disk. This affects all commands that can
> modify the recorded state.
>
> * The standard patch commutation interfaces aren't aware of the fact
> that there should only ever be a single rebase patch, so implementing
> them leads to "impossible" cases.
>
> There is also some fairly elaborated type machinery at work to
> distinguish repos with and without a rebase in progress. I am not quite
> sure this is necessary, though I do understand the motivation; more on
> that below.
>
> I had the idea that a better place to store the rebase patch is to make
> it an optional member of the PatchSet data type. This has two
> consequences: the rebase patch is no longer mixed with normal patches in
> patch sequences; so WrappedNamed can be removed and PatchInfoAnd reverts
> back to (hopefully) containing a Named patch. I experimented a while
> with various representations and finally came up with
>
> data PatchSet rt p wStart wY where
> PatchSet :: RL (Tagged rt p) wStart wX
> -> RL (PatchInfoAnd rt p) wX wY
> -> MaybeRebase p wY
> -> PatchSet rt p wStart wY
>
> data MaybeRebase p wX where
> NoRebase :: MaybeRebase p wX
> YesRebase :: (PrimPatchBase p, FromPrim p, Effect p, Commute p)
> => FL (RebaseItem p) wX wY
> -> MaybeRebase p wX
>
> It would be possible to integrate this with the existing repo type
> distinction by adding an rt parameter to MaybeRebase i.e.
>
> data MaybeRebase rt p wX where
> NoRebase :: MaybeRebase ('RepoType 'NoRebase) p wX
> YesRebase :: (PrimPatchBase p, FromPrim p, Effect p, Commute p)
> => FL (RebaseItem p) wX wY
> -> MaybeRebase ('RepoType 'IsRebase) p wX
>
> But I would rather get rid of RepoType and the rt type parameter because
> I think that this kind of type distinction is not useful enough in
> practice to justify the cost in complexity.
>
> MaybeRebase is rather similar in structure to WrappedNamed: It hides the
> right witness behind an existential, expressing that it "dangles off the
> main repo"; and it also packs the constraints with the rebase patch that
> are needed to adapt the rebase patch when (normal) patches are removed
> from or added to the the end of the patch set (using slightly modified
> versions of addFixupsToSuspended and removeFixupsFromSuspended).
>
> Since access to PatchSet internals is no longer spread all over the
> place (see the beginning of this message), it is now feasible to adapt
> the remaining functions that do access them so as to add/remove the
> necessary fixups to the rebase patch. This work led me consider how
> patches are writen and read to/from disk. So I studied
> Darcs.Repository.Hashed. It exports a very large number of functions
> that operate on the disk representation of repositories, mostly
> inventories (storage of the patches themselves is handled by the Cache
> subsystem; there are also the pending patch and the unrevert bundle). I
> looked at the implementation in detail and discovered the following
> interesting fact:
>
> The functions that remove patches from a repo/inventory are in no way
> more efficient than writing out a complete (modified) PatchSet as a new
> tentative state, but in fact rather less so. They all end up calling
> removeFromTentativeInventory, which is, more or less,
>
> readTentativeRepo >>= removeFromPatchSet to_remove >>=
> writeTentativeInventory
>
> And removeFromPatchSet will re-do all the commutations that the caller
> has already done (in order to provide the to_remove FL of patches). Same
> goes for functions that replace patches since they first remove and then
> add them. It makes sense therefore to re-write commands to no longer use
> the remove and replace functionality and instead directly calculate new
> PatchSets and then write them to the tentative inventory. The latter is
> already optimized to avoid writing inventories (which correspond to
> Tagged sections of the PatchSet) that were not touched during
> commutation, since these still have their hash attached. This will
> considerably cut down on the number functions exported by D.R.Hashed. It
> will also reduce exposure to infelicities in the patch theory for
> RepoPatchV2 (whether commutation of Duplicate patches fails or succeeds
> may depend on the order in which the commutation is performed). This can
> and should be done before adding separate read and write operations for
> the rebase patch.
>
> (BTW, adding patches (at the end) really just appends them to an
> existing inventory, so we should keep these functions.)
>
> Back to rebase. Scrapping WrappedNamed will make things a lot simpler. I
> would also like to get rid of the 'rt' (for RepoType) type parameter
> together with all the type level distinctions between repos with and
> without a rebase in progress. This is certainly debatable and Ganesh may
> have a different opinion. My reasoning is as follows: assuming the
> rebase patch is no longer mixed in with normal patches, and instead
> reading and writing of inventories will handle the rebase patch
> separately (like e.g. the pending patch is handled separately now), the
> extra wrapping of repo jobs in order to move the rebase patch to the end
> is also no longer necessary. The latter was, I think, the main
> motivation for the repo type distinction. Without it, the benefits
> (extra type safety) are not great enough to justify the cost (extra
> complexity). Similar levels of safety can be had with more traditional
> methods: modularity and pattern matching e.g. on MaybeRebase.
>
> Apropos modularity. I think this is one of the major weaknesses of the
> current code base of darcs. We have almost no type abstraction: the
> modules usually export all data constructors and we use pattern matching
> and low-level recursion all over the place. That makes it almost
> impossible to guarantee non-trivial invariants, unless they are
> expressed directly in the types. This in turn motivates the introduction
> of more and more advanced type shenanigans. I guess one reason darcs
> took this direction is that in the past it allowed quick and dirty
> addition of new features without much thought about coherent (internal)
> API design. (I am /not/ referring to rebase here, the design of which I
> like pretty much, except for the way it is integrated with the rest of
> darcs).
>
> And a last point about typing: I have come to the conclusion that it
> doesn't make much sense to distinguish between recorded and tentative
> state at the type level (the wR and wT type parameters for the
> Repository type). Here is a short description of the transaction
> protocol: when changing the repository state, first copy the recorded
> state to tentative; make changes only there; if all goes well and the
> user doesn't abort, finalize by moving tentative back to recorded,
> otherwise delete tentative and restore the original recorded state. As
> far as I know, all commands except rebase run in one single such
> transaction. The RepoJob machinery takes care of starting the
> transaction, so the commands only ever see the tentative state, so how
> could they ever accidentally mix recorded and tentative? The exception
> for rebase is when patches are suspended in a rebase pull or rebase
> apply. Here we commit before suspending patches for reasons that aren't
> quite clear to me (a comment in the code suggests that it is because
> readUnrecorded lies about the state it reads). I am pretty sure that
> with a bit of refactoring this hole can be closed at which point I would
> like get rid of the wT type parameter and instead move finalization
> (commit or abort) to the RepoJob, using the standard Haskell tools
> (finally, bracket, etc) to make sure commands never touch the recorded
> state except through a transaction.
>
> Cheers
> Ben
>
> _______________________________________________
> darcs-users mailing list
> darcs-users at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/darcs-users
More information about the darcs-users
mailing list