[darcs-users] so long and thanks for all the darcs
Ben Franksen
ben.franksen at online.de
Sun Mar 18 16:31:27 UTC 2018
Hi Stephen
Am 08.03.2018 um 09:52 schrieb Stephen J. Turnbull:
> Another long one. But we're converging!
Indeed. I think we agree on almost every point, I just didn't understand
everything you wrote at first.
> > I must say I don't really understand what you are saying. What does
> > "people like to reify branches" mean, exactly?
>
> It means that they think of them as separate entities, backed by some
> data structure in the implementation. For branch-per-repo models, the
> repo is that data structure. In addition, Bazaar has a whole internal
> API that knows all about manipulating branches, as opposed to the
> history they contain. Mercurial has this odd branch name property on
> commits.
>
> By contrast, in git there's the history DAG and its components
> (commits, trees, and blobs) and that's it. The DAG may have multiple
> heads, and what people think of as "the branch I'm working on" may not
> even be entirely contained in the repository. There is no data
> structure that surely contains everything you need to know about a
> branch, except the whole repository plus any external object databases
> configured in the repository's metadata.
Yes, I think I have understood this difference now.
Indeed, the named branches in Mercurial are a strange thing and the
equivalent of git branches are in fact Mercurial bookmarks, as I found
out by asking a few colleagues.
> > > It's very hard to get Darcs' "set of patches" model wrong. I think
> > > that's part of the appeal of Darcs.
> >
> > The "set of patches" model can be misleading.
>
> You're right. That was very sloppy of me. What I meant was a partial
> order (with patches being related if one is dependent on the other).
> But that's not very useful to people who haven't gone down that
> mathematical rabbit hole. Everybody knows what a sequence is.
Yes. In addition, there are many options such as the --from-xxx and
--to-xxx options, and also --last, which make sense only if you take the
(current) sequence of patches into account.
(I think some of these options are a misfeature and we would be better
off without them. I *never* use --from-patch or --to-patch. OTOH,
--from-tag is occasionally useful and --last is pretty common.)
> > > Another part is the appeal to fans of the so-called "user-friendly"
> > > VCSes that there's very little you can do except move straight ahead,
> > > but that also means everybody else has to move straight ahead, too.
> > > This simplifies life dramatically most of the time.
> >
> > Yes, I see this attitude a lot with (some of) my co-workers.
>
> I don't consider this an attitude, but rather a fact.
Just to note, I meant "attitude" in a neutral sense, not with an implied
connotation of "youthful" or "uneducated". There are people who just
want to "move ahead" and can't be bothered to spend more than a single
thought about all this "administration" of changes. They tend to be the
productive, get-things-done sort of people (but if unchecked have a
tendency to produce an unmaintainable mess in the long run).
> Specifically,
> there are many costs to working on branches: redundant work
> accomplishing the same task, arguing which implementation is the
> redundant one, merge conflicts, etc. I think the benefits exceed the
> costs by far, but I admit there have been times when I've been
> massively frustrated by a colleague pulling the rug out from under my
> feature branch. I'm comfortable with people who have fairly extreme
> views in that direction; maybe they had even worse experiences than I
> have had!
Certainly. But I think we should distinguish here between public
(shared) branches and unpublished (local, one developer) branches. The
latter are unavoidable in practice and I would not use anything that did
not allow me to have several local branches for the different things I
am working on (in parallel or intermittently). The former can be
problematic but (at least with Darcs, as it currently is) the main
problem IME is not merge conflicts but discoverability!
Which is one reason why I want to add in-repo branches. We need tool
support in order to keep public branches in sync with each other, as far
as that makes sense; doing this manually is tedious, error-prone, and
most of all easy to forget. And with the way I imagine branches to work
in a future Darcs, the maintenance burden with public branches could be
reduced to the absolute minimum. For instance, I think we can and should
add a command (or option) that pushes all patches in the current branch
to all other local branches, except those (and only those) that
conflict. (BTW, even without branches, it would be useful to have an
option to push/pull all patches that don't conflict or depend on ones
that do.)
> > Yes. Efficiency and extensibility are the two major weak points of
> > Darcs. This, apart from a large user base and the usual network
> > effects, is part of why git attracts developers and Darcs does
> > not.
>
> Well, there's also functional programming. C, for all its faults, is
> an imperative language, and people seem to find that easier to grasp.
I don't buy that. The Haskell community is large enough nowadays. It's
just that nobody is interested because 10 years ago they (almost) all
moved to git. I think the breaking point was when we lost the ghc team
and I do understand why they dropped Darcs back then. Then github
started to take off and that was more or less the death of anything that
Besides, large parts of the existing Darcs code aren't very "functional"
to start with. Some are in the IO monad, even if they wouldn't have to
be; others are programmed in an imperative style even if not in IO (e.g.
locally naming the result of a function application, rather than the
function).
> > We don't have a "kernel" with a stable API,
>
> git doesn't have a stable API either. I don't know about the current
> developers, but Linus basically promised that the *UI* would be
> backward compatible. Any scripts you've written will continue to
> work. The internal APIs are subject to change, though, and that's why
> there's never been a successful refactoring into a "libgit" plus CLI
> module.
Good point. I guess they manage by sheer number of developers, nowadays.
> > Yes, there is a difference when it comes to branches. I am not sure I
> > know what you mean with "in Mercurial [...] there's only one per repo in
> > practice".
>
> Mercurial now has "bookmarks", which are equivalent to git "branch
> refs" as far as I know.
Yes.
> So you could have multiple branches in a
> single repo. However, the projects I know well that use or did use
> Mercurial don't take advantage of that.
I know of a few who do.
> > > The difference is that in Mercurial "branch" is implemented as a
> > > data structure, namely, the repo itself.
> >
> > And I think this is what makes Mercurial easier to use in practice.
>
> I don't have a compelling argument, but I don't think that's so.
I agree with you, now that I have taken some time to understand the
"real" branches in Mercurial. There are valid reasons against using them
for exactly the arguments you have been making.
> For
> the record, I think the reputation for ease of use comes from (1) a UI
> that can be improved over time (Linus basically promised that he
> wouldn't break any scripts, so they can add to the UI, but not change
> or delete warts),
Okay, sounds convincing.
> and (2) the difficulty of importing "rebase culture"
> to a Mercurial or Bazaar project. By "rebase culture" I mean the way
> many git projects ask contributors to commit early and often, then use
> git to squash "uninteresting" commits and then rebase for a linear
> history.
I do like this way to manage a project (I do it like that in Darcs) and
I think the same workflow can be applied with Mercurial, in principle. I
agree that Mercurial doesn't encourage it.
> > > In Mercurial, if you use the named branch misfeature, then you *do*
> > > know which branches B and C are on. It's a misfeature because in
> > > Mercurial's design A and D can't be on both branches, which leads to
> > > hard to diagnose weirdness in bisection and other forensic operations
> > > when restricted to a named branch.
> >
> > I would like to know more about these problems.
>
> It's nothing irremediable, but if you restrict bisection to a named
> branch and the problem is created by the merge of that branch into
> default, it will tell you "it's all good" (since all commits in the
> named branch are older than the merge commit, which is not part of the
> named branch). This makes named branches vastly less useful. I've
> also had WTF moments when I hg log'd the branch and the merge commit I
> thought I had done didn't appear. Maybe it's just me? ;-)
I don't think so. I have asked people who explained to me that they
think this is all perfect and how it should be but their arguments seem
flawed to me.
> > I certainly find it hard to understand.
>
> Sure, but I bet you'd find it hard to understand using the various
> extensions to Mercurial and Bazaar that enable git-style rebasing.
Absolutely. I hate that I have to use the "mq" extension just to amend a
patch and using it requires so many boring routine operations, all as
separate command invokations, just to get a single simple thing done.
> No, git is perfectly happy to leave as many active heads as you like.
> Most of the time you deal only with HEAD, and that implicitly.
Mercurial is also "perfectly happy" with multiple heads (again,
techically speaking), it's just that they chose not to produce them when
pushing by default, you have to say --force, because they think that is
"safer".
FWIW, I think that pulling from a repo with more than one branch should
fail if no branch is given explicitly and no default has been specified
locally. With an error message that says "Sorry, I have no idea which
branch you want to pull from" followed by a list of available branches
or a hint to which command to use to list them.
Hmm. Following that thought through to the end we arrive at something
which is pretty much what git does. So we'll have only loose coupling
between local and remote branches, like in git, and also like we have
today in Darcs between repos (with _darcs/prefs/defaultrepo). Should we
associate local and remote branches by default if they have the same
name? I have suggested to do so in anothermessage, but I no longer think
this is a good idea unless as an initial setting when cloning a repo.
Because what if I create a branch named "test" and in the remote repo
someone else does the same? It would be a mistake to associate these
branches just because they happen to have the same name. I guess the git
and the Mercurial designers both thought about that and wanted to avoid
it, they just came to different conclusions.
Perhaps, what drives the complexity of the branch handling over the edge
in git is that they chose to give local names to remote branches. This
is something I am strictly against, the more so since nowadays powerful
command line completion is available even for poor bash users.
> > How does that make a difference in practice? Do you mean that they
> > do not allow this because of conflicting branch names? Why not make
> > this conflict resolvable like any other conflict?
>
> No, Bazaar and original Mercurial make it annoying to work with
> multiple heads. In Bazaar "pull" means "mirror", while all of
> "fetch", "pull", and "merge" are implemented with the single command
> "merge". Mercurial complains incessantly if you don't attach a named
> branch or bookmark to a loose head.
Git complains, too, IIRC, and as usual with its own rather cryptic
language (the "you are in detached HEAD..." sermon).
I agree that Mercurial can be a pain with its insistence on a particular
workflow that discourages history editing.
> In both VCSes, loose heads
> normally only arise in the case of a conflict, which you resolve by
> picking a version and committing it (this is the merge commit).
I am pretty sure that in Mercurial you get a loose head as soon as you
pull changes that branch off in your (local) past, and similarly if you
'hg push --force' from a clone that is not a decendant of the remote
tip. This has nothing to do with conflicts, which you get when you
merge, which is an explicit operation.
> My point is that in git, there is no conflict until you explicitly ask
> for a merge of the branches. They can coexist indefinitely.
This is the same in Mercurial, technically speaking, even though it
might permanently nag you about it ;)
> > It's just 'darcs amend' nowadays. And no, Darcs does not suffer
> > from any problem in this regard that git doesn't also have.
> [...]
> > Removing patches is exactly analogous to rebase and then delete the
> > ref to the old commit in git.
>
> You don't need to delete: rebase by definition *moves* the ref,
> leaving no ref to the old commit. That's "the rebase problem" in a
> nutshell.
That could be easily corrected by adding a new ref to the old commit in
some extra place reserved for that. Perhaps this new "reflog" thing is
just that. BTW, I always read this as "re-flog", as if git hasn't caused
me enough pain already...
> Two questions:
>
> - What happens to the old patch when you do "darcs amend"?
It continues to swim in the large pool of patches (more precisely: patch
representations) consisting of your repo, related repos, and the cache.
But there is no longer any reference to it in your current repo, so
restoring the old version is usually impossible unless you have another
repo that still references it. So this is similar to git rebase.
Here is a recent war story from my work. I had removed (as in rm -rf,
though it was done by a tool) some repositories I thought were no longer
needed. That was a mistake, there was one patch I had not pushed to our
central repo store. I tried to restore the repo from our backups but it
turned out that due to some misconfiguration this machine wasn't backed
up during the last two months (you always find out about these problems
exactly when someone needs it).
I was a bit depressed because I thought I had to redo all that work.
Then I remembered the Darcs cache. I hacked together a long line of
shell script involving find, zcat, and grep and indeed found the missing
patch after running it for several minutes (the cache had grown to a
sizeable 7 GiB over the years). The next part was to copy it to the
_darcs/patches of another clone of the repo, then open
_darcs/hashed_inventory in an editor and append the meta data and the
hash (the file name of the patch) and then run 'darcs repair' and cross
fingers.
This actually worked because this particular representation of the patch
applied cleanly to the top of the repo. It can fail (or worse, produce
wrong results) because the patches don't know themselves about the
context in which they should apply; this knowledge is fully contained in
the inventories, see below. If the patch had been in a file where I had
made more recent changes, then I would have to first find the right
context (obliterate offending patches, try again, etc).
> - How does the inventory differ from a tag?
The notions are related but it is important to not conflate them.
Inventories are an internal concept in Darcs. There is no command
referring to inventories explicitly (except a few seldom used
maintenance commands such as optimize or convert). They are
simultaneously used to
(1) reference patches, and
(2) determine their order.
A simplified view is that in each repo you have one very long inventory
with references to all patches in the order in which they should be
applied. In practice there are some optimizations, see below.
Tags are a user visible concept. A tag is a patch which makes no change
but /explicitly/ depends on all patches that were present when the tag
was created. Darcs patches can have these extra dependencies, in
addition to those implied by the commutation rules, and you can add them
to normal patches, too, with the --ask-deps option. (As an aside, this
means you could emulate version-based VCSes by always explicitly
depending on all existing patches when you record. This proves that
Darcs is strictly more powerful than git or Mercurial, since you can't
emulate Darcs with them.)
The connection between tags and inventories is this: when a tag is
created it is "clean"; this means that it depends on all patches before
it (in the current repo order). In this situation Darcs automatically
saves the current head inventory (with the clean tag on top) and starts
a fresh inventory with a "parent" reference to the old one. This is an
important optimization. For instance, the most commonly used commands
only need access to the head inventory, which is normally much smaller
than the whole sequence of patches.
(The head inventory also contains a pointer to the current "pristine
tree", which is an internal representation of the working tree without
any unrecorded changes. And along with each patch reference it also
contains the patch meta data, which means that for selection and viewing
we don't normally have to read the patch files. In principle we could
read all patch data lazily, but for some obscure reasons we don't seem
to do so consistently.)
So tags and inventories are related but not necessarily so. Each
inventory except the current one has a clean tag on top, but (as far as
I know) not every clean tag must be at the head of an inventory. It is
also possible to have "unclean" tags due to commutation, e.g. if you
pull a tag on top of unrelated changes, then the tag will become unclean
(unless you pass --reorder-patches).
> > It would not be hard to make Darcs follow (modern) git more closely
> > here, so that by default we keep a reference to patches that aren't
> > refered to any longer by the "mainline" (the "current"
> > branch).
>
> Great minds think alike!
>
> I do have a use case for this that shouldn't offend even the most
> ardent fan of repo-per-branch. The most work I ever did with Darcs
> was like 10 years ago. Darcs has always had great facilities for
> patch editing. XEmacs had this one contributor who was in the habit
> of going dark and resurfacing with a megapatch (literally: 35,000
> lines and 2MB or so). So I imported the current version into Darcs
> and then applied the patch. Then I started editing into reviewable
> feature patches. (Yes, I did this because the reviewers refused the
> patch and I desperately wanted about 5,000 lines of it. ;-)
I love these war stories... ;-)
> I made a lot of mistakes, though, and it would have been useful to be
> able to "rewind" to older versions of the patches, and break them
> apart differently.
With Darcs I usually work around this by making a local clone before
doing something I might later regret:
darcs clone --lazy . ../saved-before-dangerous-operation-xyz
This operation takes about half a second with the Darcs repo (on my
aging laptop, the repo has about 12000 patches). But it is easy to mix
up these one-off clones, they tend to clutter up your directory tree,
and it can be unclear when they can or should be removed.
Having the references automatically saved inside the same repo (in the
form of alternative head inventories a.k.a. branches) would be a big
improvement. Again, a matter of safety, discoverability, and
convenience, less one of ability in principle.
Cheers
Ben
--
"I tend to avoid fiction about dysfunctional urban middle-class people
written in the present tense." -- Ursula K. Le Guin
More information about the darcs-users
mailing list