[darcs-users] Detecting hunk moves [was: Automatic detection of file renames]

AntC anthony_clayden at clear.net.nz
Wed Aug 28 03:51:52 UTC 2013


> Ganesh Sittampalam <ganesh <at> earth.li> writes:

> > On 26/08/2013 05:24, AntC wrote:
> > ...
> > I'm interested how this "similar content" does or could work.
> 
> The general idea of detecting patches like renames or hunk moves is to
> record better patches that the user might have recorded for themselves.
> Note that "hunk move" is still hypothetical as a darcs patch type.
> 

Thanks Ganesh, (Yes I know that "hunk move" is a 'future'.)

> Once a patch is inferred based on whatever heuristics and recorded, it
> would be treated just like the user had recorded it by hand in future
> merges. ...

Hmm? I guess that the heuristics have to record the patch in such a way 
that when it is merged/commuted it not only follows "darcs rules", but 
also has the same 'moral effect' (as darcs calls it). Could we reasonably 
expect a user to understand how that works out in all possible contexts?

So patches based on "similar content" are tricky: are they recorded aginst 
the file/line number location, or against the content/context? -- wherever 
that goes in the target repo. And what if that content is not pulled into 
the target repo?

> > From 
> > previous discussions on darcs, I believe that git concentrates more on 
> > matching up content than matching source line numbers(?)
> 
> git will apply heuristics at merge time, whereas darcs has to apply the
> heuristics at record time if ever.

Yes, and that's why darcs' patch algebra appeals to me. With git, I worry 
that its merge algorithm will produce different effects in different 
contexts, such that it might silently 'lose' some of my patches. (In 
practice, that probably happens rarely; but git does for example produce 
an 'empty commit' if it thinks that the repo already has my change. Since 
the commit ID is a hash of the content, this loses the history as well. 
And git rebase seems to completely mangle history. I'd rather that patches 
persist, so that I could 'audit' the history to see that the same patch 
got pulled twice.)

> 
> > Suppose we have this sequence in Repo A:
> > + create file F
> > + add hunk text H1 to F
> > + insert hunk text H2 into F (into the middle of H1)
> > 
> > The author knows (but darcs can't) that the content of H2 'links to' 
H1.
> > (For example, H2 is program code that refs names declared in H1.)
> > (I'm not sure if H2 is dependent on H1 in a darcs sense, because you 
> > can't commute the two hunk operations -- you'd have to split 
> >  H1 into two 'hunklets' of text.)
> > 
> > There's then this sequence in Repo B.
> > + pull create file F
> > + pull add hunk text H1
> > + create file G
> > + hunk move text H1 to file G (-cut+paste)
> > - this leaves file F empty
> > ? pull hunk text H2
> > 
> > I'm guessing that darcs will put text H2 into file G, as the only 
> > content -- then compile would fail(?) Would git do the same?
> 
> Yes, that would be what I would expect to happen. If H2 depends on H1
> then this seems like exactly the right thing to do and I don't see why
> compilation would fail.
> 
> Given what you say below you might have meant to say "file F" here;

Aargh! and big apologies for confusing you. You are quite right that I 
meant to say "file F". I guessed that patch H2 would go into the same file 
name as it came from (in the absence of file renames).

So thank you for correcting my guess. I'm interested in understanding how 
(and why) darcs knows to target the content (where H1 has gone), rather 
than the address (file name/line number).

> but I don't think H2 would ever end up in F automatically.
> Either the merge would fail with a conflict or it would end up in G.
> 
> Nonetheless, there are certainly more complex scenarios where darcs
> would do the wrong thing, as there are without hunk move. For example if
> H2 depended on other content in F.

Yeah. what if there was H3 inserted within H1 at exactly the point H2 
wants to go? What is the 'right thing'? Insert before H3, or after H3, or 
report a conflict? What if H3's text is the same (or similar) to H2's?

> 
> If I understand your terminology correctly, then Darcs always follows
> content/context. I don't think following file names and line numbering
> would work well in general merges would usually produce bad results -
> e.g. merging a file change to A and a rename of A to B should result in
> a file change to B, not to A.
> 

Thank you Ganesh. In looking at the darcs theory stuff, most of its 
examples for commuting show 'shuffling' of line numbers from hunk 
deletes/inserts, as if darcs is following addresses rather than content.

AntC



More information about the darcs-users mailing list