[darcs-users] repository state identifier(s) in "darcs show repo"

Guillaume Hoffmann guillaumh at gmail.com
Mon Dec 14 16:48:31 UTC 2015


Hi everyone,

As said by David Leuschner in a previous mail, one of the shortcomings
of Darcs is:

"it's not as easy to refer to a specific state of the repository using a hash".

As a developer I know that Darcs uses (internally) the pristine hash,
which is a hash of the recorded working copy. However two repositories
can have the same pristine hash and different histories (eg, one being
a superset of the other with patches and their corresponding
rollbacks, or one having tags that the other lacks). But it can be
good enough for some purposes (lazy cloning).

Should "darcs show repo" show that hash?

Now most importantly, we need a hash that would identify a set of
patches independently of reordering, since it's what Darcs considers
the history of a repository. Doing it right, eg building and hashing
the dependency graph of all patches, is costly. Moreover we do not
have any infrastructure to retrieve a set of patches from such hash.
(That's the scenario in http://darcs.net/Ideas/ShortSecureId ).

So can we just have one that would enable us to quickly check that two
repos have the same patches, ignoring reordering?

I propose a simple checksum: XOR all patch metadata hashes!
Probability of collision should be low enough since patch metadata
hashes are good hashes. Calculating the XOR is as fast as reading the
inventories of the current repo (which can be lazy) plus the overhead
of generating and XOR'ing the hashes.

Darcs itself would not need to store this XOR, it seems. But there
could be many uses of it by third-party tools, on the other hand.
Darcsden could show it for comparison purposes. A development team
could maintain a XOR-to-patchset map to identify repository states
encountered by its members. Let the tools emerge later!

So, should "darcs show repo" show that XOR?

Opinions?

Guillaume


More information about the darcs-users mailing list