[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?
AntC
anthony_clayden at clear.net.nz
Thu Sep 13 00:59:39 UTC 2012
AntC <anthony_clayden <at> clear.net.nz> writes:
>
> > Kevin Quick <quick <at> sparq.org> writes:
> >
> > On a more abstract level, Patch Theory ...
> > ... does not require, nor implement a DAG of patches, and in any
> > particular repository, all patches are active ...
Hmm, that claim about the unimportance of DAGs seems rather hollow:
- All the complexity of commutation is exactly to do with sequencing.
- I've now learnt there's extra complexity to do with (attempting to) pull
duplicate patches. (And how duplicates interact with conflicts.)
If a repo were really just a set of patches,
managing duplicates would be easy(?).
- Possibly there's accidental complexity [per Fred Brooks]
in having to commute patches,
and perhaps there's accidental conflicts?
(I've seen comments that for some conflicts
the resolution seems 'obvious'.)
> > You might also be interested in the following paper:
(Loh/Swierstra/Leijen Principled Approach -- henceforth L/S/L)
>
> The problem with that approach is that a practical VCS only gets to see the
> repo intermittenly (at record points). A lot can have changed (especially
> w.r.t. contents of files). How does the VCS know which files and which lines
> got copied/changed vs. deleted/inserted? How can it keep track of all the
> internal file id's and line id's?
>
This bites darcs with file moves (renames):
you can't detect a move vs. remove-then-add-then-paste-content.
So the key features of L/S/L are:
- Represent patches in a way that is independent of sequence.
- Then we can see the repo as a *set* (or *bag*) of patches,
rather than a Directed Acyclic Graph.
- Identify a file in the repo using a persistent unique identifier
(perhaps a GUID) rather than its dir/name.
So that we can distinguish same-named files in two different repo's
as coincidence vs. deliberate.
In darcs:
- To think of a patch is the *same* even though it morphs through commutation,
can we be more precise on what about the patch is the same?
(apart from its id)?
Something about the sameness of effect from a patch?
So, how to look at primitive patches?
(as identified by a ppid, which might be a GUID):
- Aim to represent the effect of the patch in a context-independent format.
- Also need to represent pre-conditions (restrictions on context).
==> Then the semantics of a patch is:
- In any context meeting the pre-conditions,
- 'applying' the patch will always yield the right 'moral effect'.
(That is: the intended change to the content of the repo.)
- (By the way, since the patch representation is context-independent,
perhaps there can be a better job of giving it a digital signature
-- as L/S/L point out.)
Pre-conditions will typically make reference to other ppid's. For example:
addfile requires the directory to be present.
But identify the directory by the ppid that created it, _not_ by its path.
(Similarly if two files with same path/name exist in two repos,
they are only the same if they have the same ppid.
Which implies one is an addfile, and
the other was pulled from that patch into the target repo.)
Pre-conditions are also to constrain the context in other ways:
For example: can't pull a patch that creates file "dir/Makefile"
into a repo that already has "dir/Makefile".
(That is, you must first move/rename the existing Makefile.)
There is no sequencing of patches as such, and therefore no commuting.
(So this approach is no longer like darcs's patch theory.)
But there is something equivalent to sequencing, and to conflicts, as a result
of the pre-conditions.
Example: putting a hunk into a file has a pre-condition that
the file exists, by reference to its ppid.
So I can't pull the hunk into a repo that doesn't have the file.
(And it's not good enough to have a file with the same dir/name.)
If I pull both the create and the hunk,
there's only one sequence that's valid.
In general, if I'm pulling a bunch of patches,
they should stay in the same sequence as in the source repo.
This is so far from darcs, I'm wondering if we even need patches to be
invertible?
(Since we're not commuting,
we don't need to be able to go either way round the diamond?
L/S/L do include inversion, but its only role
is to support treating a repo as a multiset of patches:
If a patch appears n times,
its inversion must appear either (n - 1) or n times.)
If we do support inversion, then we need to represent the post-conditions,
so that they can be 'flipped' to pre-conditions on inversion.
Then we're building a Hoare Logic kind-of-a-thing.
So far, this is very similar to the Loh/Swierstra/Leijen paper, with an
internal file id got from the ppid.
Note that each file can be tracked to a ppid; and the file's current name can
be tracked to a (possibly different) ppid; and likewise with each directory
and its name (path).
So let's apply the same thinking to the content of a text file:
- all content of a text file can be described as a concatenation of hunks.
- each line can be tracked to the hunk operation ppid that inserted it
(possibly overlaid with token replace ppid's).
Here's the novel bit: unlike L/S/L's idea of a line id, we'll do the same
ppid 'trick' with hunks:
- Instead of 'locating' a hunk operation by its target file and line number,
- Locate it by target _hunk_ and line number
-- that is, offset within the text inserted by the hunk.
- (And of course identify the target hunk by the ppid where it started life.)
AntC
More information about the darcs-users
mailing list