[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?
AntC
anthony_clayden at clear.net.nz
Fri Sep 21 06:01:24 UTC 2012
Stephen J. Turnbull <stephen <at> xemacs.org> writes:
>
> AntC writes:
>
> > THis is exactly the sort of example I'm trying to work through. So
> > my approach is (trying to) separate out what applies for the
> > container (file) vs the contents (lines).
>
> Well, I think you should try to sort this in conceptual terms first,
Thanks Stepehen. You're possibly not reading this in context of my "very
speculative approach"
http://lists.osuosl.org/pipermail/darcs-users/2012-September/026698.html
Conceptually, this is trying to achieve the same thing as L/S/L's approach for
a line-id, but piggy-backing on darcs' current approach for precisely
observing hunk changes. (From what you're saying, it would be even better to
piggy-back on git's ability to spot hunk moves. Is there some reason darcs
doesn't/can't look for those?)
> ... most importantly, what are the use cases where the programmer might
> care about the difference between a file's container (which might have
> different names either sequentially (renames) or concurrently (links)),
> and the file's contents?
Essentially I am agreeing with git that tracking the contents is far more
important than tracking the container. But in complex code bases (with
programs and scripting and install routines, etc in a variety of languages)
there are semantic connections between content in different files and
connections for file dirs/names to/from content.
>
> ... git's find-copy/move-harder features
> defines content movement by equality of lines as strings; it is 100%
> reliable at finding those moves.
Given that there are typically many lines in a repo with exactly the same
content (blanks, a single opening brace or single closing brace, single open
comment or single closing comment, horizontal line separators between
sections, standard program initiation sequences/shutdown sequences/error
handlers/template calls), how can it be so sure? Is git looking only at the
content the programmer can see, or does it look 'under the covers' at disk
address, etc?
Suppose the programmer moves some text, then edits it before recording? (Yes,
bad practice I know -- record early! record often!)
>
> > I'm envisaging a move-file command (as per darcs), and a move-lines
> > command, so that the programmer can be explicit about their intent:
> > - are these two completely new files?
> > - or one with continuing identity, one new?
> > - (whether or not one of the files has the same name as before
> > is an orthogonal issue)
> > - for each file, is this completely new content?
> > - or continuing content (from where)?
> > - or (more likely) a mix of new and continuing?
>
> This is a rather large burden to place on the programmer. Will they
> really bother to learn to do this correctly?
With move-file we expect the programmer to instruct darcs, so that it can both
record the patch and make the move. For move-lines, I agree this is less
convenient. Perhaps the best of both worlds is:
- at record points use git-like methods to detect moved lines
- confirm with the programmer that this is a move (rather than new text)
- make sure it's capturing all and only the moved lines
>
> > The critical issue is determining how to apply patches pulled from
> > other repos where the file splitting hasn't occured (perhaps a
> > bugfix on the pre-refactored code).
>
> Simple. You apply it to the same lines. ...
Exactly what I'm aiming for. Where "same" means same line-id, as tracked
through move-lines, to wherever the lines are now. (The target lines might be
in a different file in this repo compared to where we've pulled the patch
from.)
> ... In git, if you've changed
> the content of the lines (eg, variable rename) you won't be able to
> find them, but you won't be able to apply the patch anyway because git
> doesn't know how to commute patches.
So I'm aiming to cope with variable renames. I'm representing patches in a
context-independent way, so that you can apply patches in a different sequence
(or omit some patches), but I'm not using a commute-like mechanism.
>
> >
> > I'd prefer to handle that as a move-lines for one (or both) of the
> > ranges.
>
> Theoretically, yes. But will users properly discriminate between
> those commands?
>
Don't worry, the VCS is going to validate that the move-lines does exactly
capture the change in content. The risk is that the programmer will move lines
(through edit/copy/paste) and then 'shuffle' the sequence and then change some
stuff before they remember to record. So now it's too difficult to trace the
movements by algorithm, and the programmer's forgotten exactly what they did.
So at worst that ends up being unconnected hunk deletes and hunk inserts, and
the VCS has lost track of the line identities. But is that any worse than
darcs or git?
AntC
More information about the darcs-users
mailing list