[darcs-users] Re: Whitespace in filenames

John Meacham john at repetae.net
Fri Aug 8 21:42:33 UTC 2003


On Fri, Aug 08, 2003 at 01:45:36PM -0400, David Roundy wrote:
> This actually sounds similar to what darcs actually does most of the time.
> The catch is that it means that you have to hold the modified lines all in
> memory while doing this.  For a few patches this is all right.  But if all
> the files in the repo are modified, it means holding the entire repo in
> memory, which is not usually a good plan.  For this reason, when you do a
> darcs get, it applies each patch to disk in sequence.  If you did a pull,
> it would get all the patches and compose them in memory (as you describe
> pretty much, except using lists of lines rather than arrays, so it's a bit
> slower), and then write to disk.
I was thinking the line data themselves would just be written to a file
as they occur and in memory you would have an array of file offsets into
that temporary file. then in your creation phase, you just do
random-access on the file copying lines to their final destination. if
you wanted to be tricky you could even use sendfile(2) when it is
available and really speed things up. (this would also reduce memory
requirements a lot since the line data need never be represented in
haskell, although, for perfect implementation line lengths would be
available in the patches). This could even be faster than storing the
lines as haskell strings as haskell memory allocation/deallocation can
eat up a lot of resources.

Now that I think about it, a tree annotated with the number of lines
(not tree nodes) under each node would be much more efficient than lists
for the file indirection lists, there is probably something suitable in
osaki's book. the data stored in the tree would be index,length pairs.
representing indexes into the array of file offsets and the number of
subsequent lines. the nice thing about this structure (other than the
logarithmic lookup) is that it grows as the number of hunks processed,
rather than the number of lines.

> So yes, I could write a variant of get (chosen by a command-line flag) that
> wouldn't write intermediate states to disk, so that people could deal with
> repos that have states in the past that can't be expressed on their
> filesystem.  Or such people could do
> 
> darcs inittree && darcs pull -a badrepo
> 
> which would have the same effect.
> 
> There would still be a problem when running darcs check, but that's as it
> should be, since you do have a corrupt repo (for at least one definition of
> corruption).

I don't understand why there need be multiple mechanisms at all. for
darcs check, just do the above algorithm, then insteoad of writing the
lines in order to their output files, verify the files currently in the
repository have the same lines by reading along instead of writing out.

-- 
---------------------------------------------------------------------------
John Meacham - California Institute of Technology, Alum. - john at foo.net
---------------------------------------------------------------------------




More information about the darcs-users mailing list