[darcs-users] so long and thanks for all the darcs

Stephen J. Turnbull stephen at xemacs.org
Tue Mar 20 09:33:36 UTC 2018


Ben Franksen writes:
 > Am 19.03.2018 um 09:12 schrieb Stephen J. Turnbull:

 > > I don't think this is possible with raw git on a remote repository.  I
 > > believe you need to fetch all the remote refs, and query locally.
 > 
 > In Darcs we have to query the remote repo anyway. You don't want to
 > transfer patches that are already present at the other end. I am sure
 > git has a way to avoid sending commits that the remote already has.

Of course it does.  What I meant is that there's no way for the user
to do this.

 > But git chooses to not clone all the refs by default and there is a
 > reason for that because it would have to pull all the referenced
 > commits, too,

As far as I know, git clones the whole object database and sets up one
tracking branch (which one depends on the checked-out branch in the
source).  Optionally you can restrict to a single branch by explicitly
specifying the --single-branch option.  It also copies all refs to
$GITDIR/refs/remotes/origin, so in fact you have the whole state of
the cloned repo at the time of cloning.

 > and that is costly. Not so in Darcs.

I don't see a big difference.  There's a bit of CPU involved, but a
git pack (used both for disk storage and for network transfers) is
basically a diff-compressed archive, just as it would be in Darcs.  In
practice, few branches in git get very long, or involve a lot of
difference.  Unless you mean that Darcs doesn't really support
multiple branches in a repository so it's no problem?

 > > So the solution the git developers came up with was providing
 > > namespaces (called "remotes" in git documentation) so that one
 > > name could refer to several heads at the same time.
 > 
 > This is what I mean. Even following mentally what you wrote here gives
 > me headaches. Not because of its complexity per se, but because of
 > *unnecessary* complexity.

You've already admitted that it's necessary because of name
collisions.  You just don't like it. :-)

 > Yeah, discourage the feature, but first add it because it's oh so
 > cool and cheap, making everyone's live difficult because they now
 > all have to cope with the resulting complexity.

I think you misunderstand how git works.  These are just refs, tiny
files that contain one SHA1 terminated with a newline character, no
more and no less.  People used to do things like name their tracking
branches <remote>-<branch>, but that had two disadvantages.  One, many
people did what you seem to find natural, and omit the "<remote>-" part.
After all, it's not my branch, so I won't work on it, right?  Turns
out that for various reasons people *do* unintentially commit to those
branches.  Second, whatever the name, you don't want to commit to
those branches, so having them in the default namespace is pollution.

The solution is elegant, IMO: move refs to tracking branches to a
standard (and quite arbitrary) place, and teach the functions that
have need to know (config, fetch, and push) where that is.
Furthermore, since they're not in refs/heads anymore, git refuses to
treat them as branches (checking out a remote ref puts you in
"detached HEAD" mode).  IOW, since they're just files (and not R/O),
you can edit them.  But the only way git will change them is on a
successful fetch or push, to synchronize to the remote.

 > > But to get that message you need to explicitly checkout a commit that
 > > is not the target of a branch ref. 
 > 
 > A tag, for instance.

Yes.  I guess that a lot of people would prefer that git add a bunch
of implementation complexity so that it would warn when changing
branches away from a detached HEAD, and only if there were commits to
that branch or trying to push.  

 > Copy & paste? It's 2018, not the 1970s.

I frequently drop characters at the beginning or end of a selection
when using touchpads or handhelds, and occasionally with a mouse.
Unfortunately git accepts prefixes....  SHA1s are also less than
mnemonic (at least if your name is not Ramanujan!) -- how do you know
you've got the right one when you can't query the remote for SHA1
equivalents to branch refs?  Without the refs, all you would have is a
bag of dangling heads among all your dangling heads from rebases etc.

 > > The nagging [about multiple heads] matters, though. ;-)
 > 
 > In practice it does, yes. I meant that the "kernel" has no problems
 > with it.

No, but users do.  It's frequently not obvious to me which head is
tip, and at least before my habits formed I found myself "twisting"
the mainline, which annoyed some coworkers (most those who wished they
were using Bazaar).

 > > As I understand it, the patch knows about its dependencies, right?
 > 
 > Only the explicit ones. The implicit dependencies are, well,
 > implicit.

Ouch, see below.

 > > So if you've been diligent about recording semantic dependencies,
 > > you should be able to reconstruct the feature the patch helps
 > > implement.
 > > 
 > > Do I have that right?
 > 
 > Yes, I think so.

But apparently requires a bit more luck than I estimated. :-)

 > > I suspect you'd have some work to do to even get logging right,
 > > since that presumably is based on inventories, not on the
 > > dependency poset.
 > 
 > Doesn't matter. The dependencies (if done in this rather un-Darcsy way)
 > enforce a single linear sequence per branch/repo and the inventory
 > merely reflects that one fixed order...

Yes, I understand that.  My point is that you need to do some
implementation, just as git would have to to emulate patches.

I've changed the order of discussion of filter-branch and submodules.

 > > [Offhand, I can mention] git's filter-branch capabilities
 > 
 > I don't know anything about that feature.

Sort of rebase on steroids.  It allows you to walk all parents and
automatically do DAG surgery based on arbitrary conditions (presented
as Bourne shell scripts).  For example, you can split out a
subdirectory as a separate project without ever checking out a file.

This would require a lot of implementation to do in terms of posets of
patches, and it would be like doing surgery with a chainsaw.

OTOH, Darcs might not care since history is not the central thing, as
long as the patches are right and in some "reasonable" order.
(Mostly, just respecting dependencies.)  A lot of the things that git
would do with filter-branch might be implemented in Darcs in terms of
some similar feature that works directly on patches.  Eg, in the split
out a subdirectory example, you'd move all the file adds and hunks
referring to the subdirectory to the new project, and duplicate
token-replace patches.  So probably you *can* do a Darcs-y
filter-branch pretty efficiently as long as you don't worry too much
about those parts of history that Darcs treats loosely anyway.

Interesting...

 > > [and] submodules (ie, attaching a separate repo
 > > instead of a tree to represent a directory),
 > 
 > Yes that's something we do not support yet. Though I'd say the existing
 > support in git is of the shallow sort.

I agree that the implementation is trivial in the data structures and
quite manual in the operations, but I'm not sure what "deep" support
would be, given the requirements that led to their implementation.

 > IIUC it's more or less a file with some associations between
 > subdirectories and subrepos (plus some information about their
 > remotes) and the normal git commands ignore submodules
 > completely. Correct me if I am wrong.

Almost correct (for a submodule, its subdirectory is represented in
the DAG by a commit object rather than a tree object), but there is
method behind this madness.  Specifically, the point of submodules is
to create a sort of firewall between the VCS metadata of the main
project and those of its prerequisites.  Commands normally do not
recurse into submodules because in most cases those are going to be
stable, not tracking upstream's bleeding edge, and likely not modified
in this project, either.

 > BTW, a project I occasionally work on but very often work with (at
 > work) recently decided to split into several submodules. This led
 > to general and widespread confusion about how to handle these
 > submodules and lots of criticism from users and contributors.

Yeah, submodules *are* complex.  They're also complex in Mercurial
(called something else).

Most projects can (and do ;-) avoid them, but when you need them, they
really make a difference.  They're basically a device to manage
inter-project, not intra-project, communication.

 > > I'm not sure what submodules would mean in the context
 > > of Darcs, which doesn't have the concept of tree as far as I know. 
 > 
 > Huh? Of course it has. If you mean tree as in "tree of files and dirs
 > that make up a version". But the question is still a good one and I dont
 > have an answer ready.

I mean a single object in the database that describes a tree of files
and subdirectories.  As I understand it, in Darcs you have patches and
inventories, and the tree represented by a Darcs repo is implicit in
the sequential application of patches.

 > How does git cope with a conflict between a module and a submodule? 
 > Say I have a submodule in a directory x and I add a file to the
 > parent module with name x/y.

You can't do that.  There is no x subtree in the parent, that is, no
tree object to add files or subtrees to; rather there is a commit that
only the submodule command knows how to handle -- it's opaque to the
mutation commands.  IOW, x *is* a submodule, an independent project
whose working tree resides at x.  From the point of view of developers
(and text editors :-), the working tree is just a subtree of the
parent, but the VCS metadata are completely separate.  You *can* add a
file x/y to the working tree of the parent, giving a modified x
submodule.

You can have a subtree and replace it with a submodule or vice-versa.
Diffs and things like that will do the right thing.

The complexity comes in if you change anything in a submodule.  Then
you have to make a decision about whether and to which submodule
branch you want to commit that change, and whether to propagate that
commit to the parent (remember, the state of the submodule in the
parent is represented by that commit), and to which branch or branches
of the parent.  There doesn't seem to be a typical case, so at least
for now it's entirely up to the user to figure it out, and the UI is
multistep and therefore errorprone (aka "complex").

 > > Which reminds me: I've long thought it might be an interesting
 > > experiment to use git's object database as a backing store for Darcs.

 > Commuting hunks in Darcs is already fast. We do optimize and handle the
 > case of hunks in different files quickly (there is no need to change the
 > patch rep in this case; we say they commute "trivially"). When they are
 > in the same file, commutation more or less consists of a handful of
 > comparisons, additions, and substractions with machine integers. The
 > actual content of what is removed and added is not needed, only the size
 > (number of lines).

OK, so that's basically the same "big O", but the representation I
suggested would involve more overhead, I'm pretty sure.

 > It is not quite clear to me what the motivation behind this whole idea is.

Partial git compatibility and faster checkouts and other operations on
arbitrary known versions.

Steve


More information about the darcs-users mailing list