[darcs-users] If AB and B' are repos, does that imply A and B commute?

Ben Franksen ben.franksen at online.de
Sat Dec 12 21:09:23 UTC 2020


Am 12.12.20 um 18:54 schrieb James Cook:
>>> As far as I can tell, the example is consistent with the patch theory
>>> axioms in [0] and [1]: we can simply say that in our patch theory,
>>> nothing ever commutes, and then the axioms don't have much to say.
>>>
>>> The Camp theory pdf [2] makes a claim that seems to rule out the
>>> example: in section 8.1, "Merge preparation", it's asserted that as a
>>> first step toward merging two repos, we can move the common patches to
>>> the beginning. In this case, that would imply B can be moved to the
>>> beginning of the repo AB. However, I'm not sure where that's proved,
>>> and I also can't see why the above situation is inconsistent with the
>>> axioms in that write-up. (Admittedly, I haven't read it completely ---
>>> if someone can point me to the right parts, I'd appreciate it.)
>>
>> The property you need to rule this out is a global one: you must assume
>> that each newly recorded patch has a universally unique name. The patch
>> laws are all local properties, so they cannot give you that.
>>
>> Darcs tries to ensure this by adding a random number (taken from system
>> entropy i.e. cryptographically secure, as far as that is possible) to
>> the patch name when a patch is recorded (or amended etc). But it is
>> clear that this is no guarantee, since it cannot prevent someone from
>> manually creating a patch that has the same identity as a completely
>> unrelated patch somewhere else. If you pull such a patch, you repo
>> invariants are broken and you may get crashes (or worse: inconsistent
>> behavior).
>>
>> This is a great, perhaps the greatest, weakness of patch theories a la
>> Darcs. We have discussed this at great length during the past years on
>> darcs-devel in varying contexts. It crops up again and again.
> 
> If we could easily commute a patch to its minimal context, would that
> solve the problem?

Yes.

> (E.g: the name of the patch is a hash of all the
> names in its minimal context together with the content of the patch
> when commuted to that minimal context.)

Yes, exactly.

> I remember you said earlier that there is no known way to efficiently
> compute that minimal context. Also, even if it were known, commuting
> the patch to that context could be expensive. Are those the reasons
> that scheme doesn't work?

Yes.

> Here's a sketch of an idea to try to overcome the computational
> concern, inspired by pijul. I don't know if it works because the
> details are lacking. (Also I wouldn't be surprised if something like
> this has already been discussed; this doesn't feel very original.)

I don't think an idea like this has been floated before in the context
of Darcs. I think one reason for this may be that even if we could
compute minimal contexts for prim patches, we'd still have to extend
that to conflictors and then to the named "changeset" patches. But for
your tree patch model it may be enough to consider prim patches.

Regarding Pijul, I think the decisive difference is that their patches
are such that they don't change at all when they are commuted. For
instance, a hunk that adds some lines does so by referring only to the
UUIDs of the lines before and after. And since the UUIDs are immutable a
patch never has to be adapted. Commutation is then merely a question of
determining /if/ commutation is allowed, and that information is of
course cached in the database. And since the patches never change,
minimal contexts aren't needed, since we can simply hash the (unique)
patch content and be done with it.

> For simplicity, assume prim patches are all hunks and the repo is one
> text file.
> 
> For every primitive patch in the repo, keep track of an interval of
> lines "touched" by the patch. When a patch p is first applied, its
> interval is the lines it added (and maybe one extra line before and
> after). As hunks are added, p's interval is adjusted in some
> appropriate way. If lines from the beginning or end of p's hunk are
> deleted or replaced by a later patch, p's interval shrinks. Eventually,
> p might be reduced to the empty interval, but we still keep track of
> the interval even then --- we think of p as living "between" two lines
> of the file. This is important because any later patch that touches
> that region depends on p.
> 
> Updating every interval every time a patch is added could be expensive.
> Maybe this could be mitigated by assigning unique identifiers to the
> lines of the file, and recording the intervals in terms of those
> identifiers.
> 
> Could these "touched" intervals be used to compute a minimal context
> for a new patch? The patch's name could be a hash of four things: that
> minimal context; the lines added; the unique ID of the line before the
> hunk is added; and metadata like the user-supplied description.

Let me reformulate the idea, as I understand it, to simplify and
generalise it.

It seems what you are after is similar to having a cache that, for every
patch p in the repo, stores what p looks like if commuted to its
*latest* possible position in the repo, i.e. only followed by patches
that depend on it. In other words, the patch in its /maximal/ context
(relative to the repo).

And the questions are:

(1) Can we use that to reduce the complexity of computing the minimal
context for a newly added patch? More precisely, what we aim to compute
is not the minimal context per se (as a sequence of patches) but only
the names it consists of.

(2) How much does it cost to update the cache after adding a patch?

Cheers
Ben



More information about the darcs-users mailing list