[darcs-users] Request for more details
Dalon Work
dwwork at gmail.com
Wed Nov 11 17:09:58 UTC 2020
Mr. Franksen,
Thank you for the detailed reply! I think I see better how this works now.
Patch theory details the laws that a given implementation (primitive patch
theory) must follow. Your current implementation has not been formally
proven, but is checked with random testing. (The QuickCheck that you
mention.) Is this correct?
I can see that I have perhaps been a bit too cavalier with my use of
"equality" as well, so I appreciate the more nuanced explanations. I don't
know haskell, but I will try to figure out enough to work through the code
that you've pointed me to. I also found (again) the explanation of how to
do a merge in the "Understanding Darcs" Wikibook, so that combined with
your answer helped me understand that part of it.
As far as my scalability questions, I'm glad to know (as I suspected) that
you don't have to start over to reconstruct the repository. My second
question about "batches" wasn't phrased very well, and may be just
ridiculous, so please let me know if it is. The idea is to create
"mega-patches" that are composites of several patches. Would it be
theoretically possible (i.e. correct) to operate on mega-patches, and only
when that fails, break them down and operate on single patches? From your
comments, it seems this kind of optimization may not be necessary.
As a followup question about merging patches from different repositories, I
assume that internally, the patches have been applied in some sequence. As
the patches are modified or added, the patches are allowed to move around
in that sequence, following the appropriate rules. When you pull patches
from another repository, how do you determine where in the sequence to
place the new patches? The implicit assumption in the commutation diagram
is that you have a common base context to work from. So how do you find
that common state? Or perhaps, you don't have to. Perhaps you just start
applying the new patches and seeing if they work? I'm a little fuzzy on
this point.
Again, thank you for the explanation. It's pretty interesting!
Dalon
On Wed, Nov 11, 2020 at 6:10 AM Ben Franksen <ben.franksen at online.de> wrote:
> Hi Dalon
>
> Am 10.11.20 um 14:51 schrieb Dalon Work:
> > I recently discovered Darcs, and probably like lots of people, I'm very
> > intrigued by the "Patch Theory" behind it. I've read everything I could
> > find, and to be fair, much of it was over my head. I think I understand
> the
> > basic ideas, but I still have a lot of questions. I would appreciate it
> if
> > anyone could point me at some materials I missed or misunderstood, or
> could
> > explain in greater detail how it all works.
>
> Unfortunately there is no ultimate reference for patch theory that
> covers every aspect and every variant implemented by darcs.
>
> > So, starting with a set of patches with a linear history: AB, and if you
> > want to "undo" A, you have to reorder A & B so you can undo A.
>
> Correct. However, note that there are two ways to undo a patch A: you
> can record a new patch that inverts all changes made by A. Or you can
> just drop it from the end of the sequence. The first is 'darcs rollback
> + darcs record', the second is 'darcs obliterate'.
>
> > Take AB,
> > commute them to get B'A', and then post-multiply by the inverse of A':
> >
> > B'A'.inv(A) = B'.
>
> These sequences are not equal, though they have the same effect. In
> general we consider two sequences of patches equal if they differ only
> by commutation; writing A^ for the inverse of A and [] for the empty
> sequence, AA^ and [] are not the same. In /some/ contexts we can treat
> them as equivalent, for instance when applying them to a tree, but not
> in general.
>
> > I understand the basic diagram that gives the equation 0 -> AB = B'A' ->
> 1,
> > where we start with context 0, and want to get to context 1. But I don't
> > understand how we actually solve for B' and A'? It seems we have 2
> > unknowns, but only 1 equation. This is the part I can't seem to find.
>
> The underlying theory of (so called) primitive patches must have come
> already equipped with such a commute operation. How it is implemented
> depends on exactly what the primitive patches are (e.g. their
> representation) and on what the designer of the theory wants to achieve.
> As an extreme example, a theory in which no two primitive patches can be
> commuted is valid.
>
> > Is
> > this a manual step? I can see how if the two patches have nothing to do
> > with each other, then B' = B, and A' = A. But what if they do influence
> > each other, but don't actually conflict?
>
> If you are able to read Haskell, you could look at
>
> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs
> to find out how it is done in darcs. This is not covered in any text on
> patch theory: these texts always assume a primitive patch theory as
> given (subject to certain laws).
>
> > Then, in the case of a merge of two parallel patches, how do we know what
> > the "unique" merge result is?
>
> Uniqueness of (clean i.e. unconflicted) merge follows from uniqueness of
> commute, given the usual patch laws, since the unconflicted merge is
> defined as "invert one of the two patches, commute, then invert result
> back".
>
> > (Assuming no conflicts). The equation is now
> > 0 -> AU = BV -> 1, but now we don't even know what the ending context 1
> > looks like! This to me looks like we have 3 unknowns, and only 1
> equation.
> > What am I missing here?
>
> Again, the key is to assume you already have a function commute that
> takes a sequentional pair of patches and gives you back another
> sequential pair.
>
> > My next question is about scalability. In a system with hundreds of
> > thousands of patches, it seems like a scalability nightmare, as the naive
> > description of how the system works makes it seems that to reconstruct
> the
> > state of the repository would require building the working tree from the
> > empty context every time.
>
> Correct. Without additional efforts, a "naked" patch theory would be
> extremely slow!
>
> > There has to be a way to "snapshot" a set of
> > patches so you have a new ground-zero to start from.
>
> And there is (in darcs). We call it the "pristine tree", which is very
> much like a "tree object" in git (also stored in a hashed format,
> similar to git). It corresponds to the current state of a repo i.e. the
> sequence of recorded patches. You could, in principle, store
> intermediate snapshots, too, e.g. for each tag, but this is not
> currently done in darcs.
>
> Darcs has other "snapshot" like optimisations. For instance, the
> sequence of patches in a repo is broken up into so called inventories,
> which are sequences of patches terminated by a tag that depends on all
> previous patches. This means that most operations only have to work with
> the "current inventory" which contains (only) the patches since the
> latest tag.
>
> > And then, importing
> > patches from an outside repository, is it possible to "batch" them, and
> > work on a sets of patches inside of comparing patches 1 at a time?
> (ignore
> > conflicts for now).
>
> Not sure I understand the question correctly. Ultimately you need to
> merge patches one by one and also apply them one by one.
>
> > It seems to me
> > that you would have to take all the incoming patches, and 1 at a time,
> > modify the context for each one of them to apply them on top of the
> > importing repository. It sounds painfully slow.
>
> Starting with the current state (the pristine tree) and applying patches
> (or unapplying them) to it to get a new state is pretty much optimised
> in darcs but dpenending on the number and size of patches can still take
> some time. An extreme case is doing 'darcs check' on the current darcs
> repo with about 13000 patches; this takes about 15 seconds on my laptop.
>
> But normal operations seldom consider more than a hand full of patches,
> perhaps a few hundred in bad cases.
>
> > How do you determine which patches are dependent on which patches? I know
> > the user can manually specify dependencies, but there has to be some
> > automated way to figure out the important ones, the ones that would break
> > things if not maintained correctly.
>
> Simple: B depends on A if A;B does not commute.
>
> > Naively thinking, would you have to try
> > and commute the incoming patch with every other stored patch? Or can you
> do
> > that until you find a match, and attach it to the "dependency tree", or
> are
> > there encoded manual rules?
>
> There is no explicit representation of the dependency graph anywhere in
> darcs. It wouldn't help, since patches can change representation when
> commuted, so even if you new beforehand that A and B will successfully
> commute that still doesn't give you the result of the commute.
>
> To understand how darcs merges patches it is necessary to consider patch
> "names". The name of a patch is a hash of all its meta-data (name, log
> message, timestamp, author, and some random noise added when you
> record). This is what identifies a patch semantically. Darcs assumes
> that patches with the same "name" really are commuted versions of the
> same original patch. This is insecure, but allows to quickly compare
> large sets of patches for equality.
>
> When we merge patches from another repo, we first find the set of all
> patches common to both the local and the remote repo. What we actually
> have to merge are the remaining sequences of patches i.e. those that are
> only in our repo and those that are only on the remote one.
>
> Finding the common set of patches is further optimised using inventories
> but I guess explaining that would lead us too far away.
>
> > What is the internal patch format for darcs? Like, could I implement a
> > crude patch-based system using unix diff/patch utils?
>
> No, that is not possible. The basic patch operations MUST strictly
> adhere to the laws of patch theory.
>
> > Or do you have to
> > store extra metadata to get the mechanics to work correctly? So far, all
> > I've found is the high-level "a patch is a transformation function".
> Great.
> > What does that function actually look like in real-life?
>
> As I mention above, this is not covered by the existing texts on patch
> theory. The point is that this function must be reified as data!
>
> I will give you an example. In Darcs, the most important primitive patch
> type is called "hunk". A hunk consists of the following data:
>
> * file name
> * position (line number)
> * lines to remove
> * lines to add
>
> In short, we have
>
> Hunk filename pos [ContentLine] [ContentLine]
>
> where ContentLine is how you represent a line of text.
>
> To commute two Hunks very roughly goes as follows:
>
> If the filenames differ, they commute trivially (i.e. without change).
> If what the first patch adds overlaps with what the second one removes,
> then they fail to commute. Otherwise, we can commute them by adapting
> the line numbers. The details are a bit tricky (fiddling with line
> numbers, what exactly means "overlap") but not hard to reproduce.
>
> For details, see
>
> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Commute.hs#214
>
> The definition of the primitive patch data type is
> here:
> https://hub.darcs.net/darcs/darcs-screened/browse/src/Darcs/Patch/Prim/V1/Core.hs#40
>
> It is not difficult to define your own primitive patch theory, but
> making sure it conforms to all the required patch laws is another
> matter! We never formally proved that for darcs, but instead rely on
> random property testing using QuickCheck.
>
> Cheers
> Ben
>
> _______________________________________________
> darcs-users mailing list
> darcs-users at osuosl.org
> https://lists.osuosl.org/mailman/listinfo/darcs-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20201111/8602b94c/attachment-0001.html>
More information about the darcs-users
mailing list