[darcs-users] so long and thanks for all the darcs

Stephen J. Turnbull turnbull at sk.tsukuba.ac.jp
Mon Mar 19 08:12:44 UTC 2018


Ben Franksen writes:
 > Hi Stephen
 > 
 > Am 08.03.2018 um 09:52 schrieb Stephen J. Turnbull:
 > > Another long one.  But we're converging!
 > 
 > Indeed. I think we agree on almost every point,

I think so, at this point.  You added some stuff that I don't disagree
with but think I can shed some additional light, or provide an
alternative view.

 > > I don't consider this an attitude, but rather a fact.  
 > 
 > Just to note, I meant "attitude" in a neutral sense, not with an
 > implied connotation of "youthful" or "uneducated". There are people
 > who just want

I understood the neutrality.  My point was simply that even if you're
personally disposed to the wild branching style that Tom Lord
encouraged in Arch and you still see in some GitHub repos, it's a fact
that your life is often simplified if others have to move straight
ahead.

 > Certainly. But I think we should distinguish here between public
 > (shared) branches and unpublished (local, one developer)
 > branches. The latter are unavoidable in practice and I would not
 > use anything that did not allow me to have several local branches
 > for the different things I am working on (in parallel or
 > intermittently). The former can be problematic but (at least with
 > Darcs, as it currently is) the main problem IME is not merge
 > conflicts but discoverability!

Sure.  You can't get into a conflict with a branch you never tried to
merge because you didn't know about it.  I think what you're seeing
here is a form of selection bias.  It will be interesting to see if
patch theory really is powerful enough to keep conflicts manageable
when you're looking at something like git.kernel.org with 25 core
developers maintaining an average of 2.5 branches each, and fans
feverishing trying all possible merges. :-)  My guess is "no", but it
would be very cool if I were proved wrong!

 > > Well, there's also functional programming.  C, for all its faults, is
 > > an imperative language, and people seem to find that easier to grasp.
 > 
 > I don't buy that. The Haskell community is large enough nowadays. It's
 > just that nobody is interested because 10 years ago they (almost) all
 > moved to git.

It doesn't matter how big the community of capable programmers is,
unless it's so small it constrains how big the intersection with your
community of users can get.  It's that latter that matters.  I agree
that GHC moving to git was a huge blow to Darcs in this way.

 > Besides, large parts of the existing Darcs code aren't very
 > "functional" to start with.

Sure, but the idioms needed to program in imperative style in Haskell
don't look imperative to relatively naive users.  Remember, to add new
features in git you only need a language that can call the "plumbing"
functions and catch stdout, name the program with the "git-" prefix,
and put it somewhere on PATH.  That Linus guy is crazy like a fox!
Then to make it efficient, translate to C.  Most languages look enough
like C (or support imperative idioms) to make this straightforward.

 > FWIW, I think that pulling from a repo with more than one branch should
 > fail if no branch is given explicitly and no default has been specified
 > locally. With an error message that says "Sorry, I have no idea which
 > branch you want to pull from"

That is what happens with git.

 > followed by a list of available branches or a hint to which command
 > to use to list them.

I don't think this is possible with raw git on a remote repository.  I
believe you need to fetch all the remote refs, and query locally.

 > Perhaps, what drives the complexity of the branch handling over the
 > edge in git is that they chose to give local names to remote
 > branches.

Well, git simply doesn't "do" remote branches the way that Mercurial
and especially Bazaar do.  Yes, you could enhance git to have a
distributed DAG quite easily, but what users manipulate as "branches"
are nothing more than local variables pointing to head commits.  So
the solution the git developers came up with was providing namespaces
(called "remotes" in git documentation) so that one name could refer
to several heads at the same time.  (In practice, there's no way to
change the default namespace, so to refer to a name in a non-default
namespace you need to spell out the remote, e.g., "origin/test"
vs. "test" in the example you gave.)

What you described as "linking at clone time" is exactly what git
does: it automatically copies the specified branch ref (default
"master") from the "origin" namespace to the default (unnamed)
namespace.  It is strongly discouraged, though not impossible, to
change refs in a remote's namespace locally.

 > Git complains, too, IIRC, and as usual with its own rather cryptic
 > language (the "you are in detached HEAD..." sermon).

But to get that message you need to explicitly checkout a commit that
is not the target of a branch ref.  This occasionally saves typing,
but is never necessary, and never happens via pull or push.

In Mercurial, however, concurrent development always results in
multiple heads competing to be "tip".

I believe the difference in philosophy is that git expects your object
database manipulations to be entirely local for speed reasons.  The
only remote operations are fetch and push (pull = fetch + merge).
This requires that you have a "handle" for the fetched commits, which
may as well be a ref.  SHA1s are awkward and typo-prone, so it copies
refs that you fetch from the remote repo to the local repo's
corresponding remote namespace.

Mercurial on the other hand does implement some ability to query the
remote, and tries to keep the *content* of two repos that the user
thinks of as representing the same branch consistent.

 > > My point is that in git, there is no conflict until you explicitly ask
 > > for a merge of the branches.  They can coexist indefinitely.
 > 
 > This is the same in Mercurial, technically speaking, even though it
 > might permanently nag you about it ;)

The nagging matters, though. ;-)

 > > You don't need to delete: rebase by definition *moves* the ref,
 > > leaving no ref to the old commit.  That's "the rebase problem" in a
 > > nutshell.
 > 
 > That could be easily corrected by adding a new ref to the old
 > commit in some extra place reserved for that. Perhaps this new
 > "reflog" thing is just that.

Exactly.  It doesn't "correct" very much, though, in the sense of
making it easier to do what you wanted to do that condemned you to
rebase hell in the first place.  It simply makes recovery to a known,
good state a lot more convenient.  It's great for those of us who love
our mothers and don't want them to worry, but also like playing with
matches and sharp blades. :-)

 > > - What happens to the old patch when you do "darcs amend"?
 > 
 > It continues to swim in the large pool of patches (more precisely:
 > patch representations) consisting of your repo, related repos, and
 > the cache.

As I understand it, the patch knows about its dependencies, right?  So
if you've been diligent about recording semantic dependencies, you
should be able to reconstruct the feature the patch helps implement.

Of course humans can't do that consistently or accurately, so in
practice you need some luck if you don't have an inventory with that
patch to establish "close to correct" context.

Do I have that right?

 > > I made a lot of mistakes, though, and it would have been useful to be
 > > able to "rewind" to older versions of the patches, and break them
 > > apart differently.
 > 
 > With Darcs I usually work around this by making a local clone

Sure.  The problem was that I had a 35000-line patch that was going to
break up into more than one hundred "coherent changesets".  I would
need to be making such clones every few minutes, and managing them is
going to be, uh, "tedious" (as you recognized).  With git, it would be
on a branch which I could reset instantaneously, and some of the
management burden would be alleviated by "lightweight branching" and
by the reflog.

 > (As an aside, this means you could emulate version-based VCSes by
 > always explicitly depending on all existing patches when you
 > record. This proves that Darcs is strictly more powerful than git
 > or Mercurial, since you can't emulate Darcs with them.)

That's very cute!

Without contesting that basic fact, let me think about the
implementation a little...

I'll grant that you can recreate the DAG in this way, but that's a far
cry from emulating git or Mercurial.  You'd need to add branch refs,
but that's surely trivial.  I suspect you'd have some work to do to
even get logging right, since that presumably is based on inventories,
not on the dependency poset.  I don't know what "DAG" means if Darcs
is going to go around commuting patches, so I guess you have in mind
some sort of restricted mode -- I'm not sure it's fair to call that
"Darcs". :-)

Offhand, I can mention submodules (ie, attaching a separate repo
instead of a tree to represent a directory), and git's filter-branch
capabilities.  I'm not sure what submodules would mean in the context
of Darcs, which doesn't have the concept of tree as far as I know.  I
guess that filter-branch, though possible to implement, would be
prohibitively inefficient as well as restricted from doing some things
that Darcs has no "vocabulary" to describe (yet).

Which reminds me: I've long thought it might be an interesting
experiment to use git's object database as a backing store for Darcs.
The idea is to add a patch object type, which would contain a
dependency list and a representation of the patch.  You'd need at
least two subtypes.  diff-style patches would be represented by a pair
of tree IDs, and other types of patches would contain a script, which
would allow a crude token-replace implemented with sed or awk as well
as rename and copy operations.  If you're willing to forego
token-replace, it's possible to represent rename and copy (and
creation and removal of empty directories) with a tree pair.  I guess
you'd also want an inventory object type.

Most diff vs. diff commutes would be extremely fast, since the
intersection of changed objects in two patches would be null, and
you'd never need to look at the content of patches.  This would be
offset by the overhead of reading several objects, I suppose, and by
the need to actually do diffs in the case of collisions, of course.

I should say that Tom Lord actually implemented this kind of thing in
his ill-fated "revc" successor to Arch/tla.  Of course that family was
designed as a sort of Ginsu knife to make manual manipulation of patch
sequences convenient.  There was no patch theory involved.  I never
used it (not even to test, it was that ill-fated :-( ), but Tom seemed
quite pleased with the speed and flexibility.

Regards,
Steve


-- 
Associate Professor              Division of Policy and Planning Science
http://turnbull/sk.tsukuba.ac.jp/     Faculty of Systems and Information
Email: turnbull at sk.tsukuba.ac.jp                   University of Tsukuba
Tel: 029-853-5175                 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN


More information about the darcs-users mailing list