[darcs-users] questions on patch theory

David Roundy droundy at abridgegame.org
Fri Sep 12 12:35:48 UTC 2003


On Thu, Sep 11, 2003 at 02:33:58PM -0400, Mirian Crzig Lennox wrote:
> I am a relatively new DARCS user; I was drawn to it by what I see as a
> quite elegant theory of patches.  My background is in maths and
> computer science rather than quantum mechanics, so I'm attempting to
> understand the theory in terms of concepts familiar to me.  I'm hoping
> that the list won't mind looking over my shoulder and perhaps tossing
> me some clues as I tackle this.

No problem! And perhaps I should mention that I think the appendix on the
theory of patches requires a bit of an overhaul.  It's still correct, but
due to a few changes I made, I think it requires a pedagogical
reorganization...

> First of all I'm trying to figure out a way to formalise the context
> of a patch, and the relationship of a patch to a tree.
> 
> The introduction to the Theory of Patches defines the context of a
> patch both as a tree, and as "the set of patches that precede it".  It
> also defines a tree as "the result of a series of patches applied to
> the empty tree".

This sort of dual definition of a tree originates from the order of
development of the theory.  At first one need define the tree as an
abstract object that is modified by patches.  Once the theory is complete,
however, it is seen that any tree is uniquely defined by a *set* of patches
(order independent).

This brings up one other problem, which is that I have two different
meanings for the term patch.  One is the logical change defined by the
patch.  The other is the "representation" of the patch.  So in a commute

AB <-> B'A'

A' is the same patch as A (in the first sense), but its representation is
different, because its representation is with respect to a context that is
lacking B.  This second issue is more serious than the first, since in the
first case the two definitions are equivalent, but certainly the
representation of a patch mustn't be mistaken for the patch itself, and I
suspect that at places I refer to the representation of a patch as a
"patch."

> >From this, I deduce the following:
> 
>     * The existence of a precedence operator.  For any two patches, P1
>       and P2, exactly one of the following three statements is true:
> 
>         P1 <p P2  (P1 precedes P2)
>         P2 <p P1  (P2 precedes P1)
>         P1 ~p P2  (P1 and P2 have no precedence relationship)
> 
>       Additionally,
>         P1 >p P2 iff P2 <p P1
>         P2 ~p P1 iff P1 ~p P2
>         P1 =p P2 iff P1 and P2 are the same patch
>         P1 !p p2 iff P1 <p P2 or P1 >p P2
>         if P1 <p P2 and P2 <p P3 then P1 <p P3
>         if P1 >p P2 and P2 >p P3 then P1 >p P3
> 
>     * There is a set P of all patches which is partially ordered
>       on precedence.

I think what you are calling precedence I would call dependency (assuming
it is a property of patches themselves rather than the representations of
patches), and I think to define it properly you need to mention involve
commutation, more specifically, its failure.

So if AB don't commute then B <p A, but if AB <-> B'A' then B ~d A.  I
believe your transitivity properties still hold.  While calculating all the
dependencies in the repo would be overly timeconsuming, the concept of
dependencies is definitely interesting (and used).

> I started thinking about the above because I was looking for a way to
> define the empty tree such that it is clear what effect any given
> patch would have on it, especially since there are obviously some
> patches which can operate on an empty tree and some which cannot.

I would probably put this question in a sort of reverse order (it is an
interesting question):  I would simply define the empty tree as the result
of not applying any patches (maybe this is a bit too circular for you...),
and then ask what dependencies certain patches must have.

(The reason being that the theory of patches can be applied regardless of
what changes the patches represent, and it seems wise to maintain that
abstraction as long as possible.  For example, it might be useful to have a
database version controlled in this manner, in which case one might want
the empty tree to be a database with no tables in it.  One thing I'd like
to do when I reorganize the Theory of Patches is to make a clearer
distinction between theory and implementation.)

For example (skipping the parenthesized sidenote, back to the question of
"required" dependencies), the patch

hunk ./funfile 1
+test

depends on a patch

addfile ./funfile

In general, every hunk patch depends on file add.  The patch

addfile ./dir/file

depends on the patch

adddir ./dir

And so on.  For every patch type we could figure out the minimum required
dependencies.  In many cases the actual dependencies will differ from the
required dependencies, but they would at least allow us to easily see which
patches could be applied to the empty tree.

(I just noticed that you say something similar to this in your last
paragraph...)

>     * From the implementation, it looks as though there are only two
>       kind of patches which can operate on an empty tree: "file add"
>       and "directory add".

There is also a (perhaps undocumented) setpref patch that can operate on
the empty tree.  It operates on the repository preferences metadata, which
actually isn't part of the "tree", so this is really just an advisory
patch.  If conflicts happen, it is just assumed that you set the
preferences to your tree as you wanted them.  Because of its status as a
patch which doesn't affect any tree data, no setpref patch has any
dependencies (i.e. it commutes with everything).

>     * If we posit that an empty tree contains no directories at all
>       (not even the "top level" directory), then the *only* patch
>       which can operate on the empty tree is a "directory add" patch
>       for that top level directory.  (This is only for theoretical
>       purposes of course; it doesn't need to be explicit in the
>       implementation).

This would seem a bit silly to me (even from the theoretical sense).  You
have something of a chicken and egg problem.  The patch

adddir SUPERDIR/SUBDIR

always depends on prior the existence of SUPERDIR.  So the initial
directory add of the top level directory would have to be a special
directory add, since there is no SUPERDIR existing.  It seems cleaner to
have a unique directory "." which always exists than to have a unique
directory add which has no superdir and must have the dirname ".".

I guess maybe it's six of one, half a dozen of the other.  One could either
think of the empty tree as consisting of a single patch which adds the
directory ".", or as a world in which only the directory "." exists.

> This way we have an implied precedence for lots of different kinds of
> patches; a file or directory add depends on the "directory add" patch
> of its parent, except for the top-level directory which has no parent.
> A file modification patch depends on the that file's "file add" patch.
> A file or directory move or delete patch also depends the "add" patch
> for that file or directory, and so on.

Agreed.  This information is all included in the commutation operation (or
function, if you like), except that you can't ues the commutation operation
unless you have a valid sequential pair of patches.

> Establishing this partial ordering goes a long way towards being able
> to prove the theorem that merges always succeed.

Actually, merges always succeeding isn't a theorem, but rather a postulate
or definition.  This is one of the least clear sections, but I force every
merge to succeed by using a "merger" patch if there is no other way.  And a
merger patch is simply defined such that the merge is correct.  i.e.

merger(A,B)A <-> merger(B,A)B

> So... does any of this make sense?  Feedback is very much welcome.

Yes!  :)
-- 
David Roundy
http://www.abridgegame.org




More information about the darcs-users mailing list