[darcs-users] Re: Whitespace in filenames

David Roundy droundy at jdj5.mit.edu
Mon Aug 4 10:44:49 UTC 2003


On Sun, Aug 03, 2003 at 02:18:52PM -0700, Zack Brown wrote:
> I can see why you'd want to do the checking before actually modifying the
> filesystem, and I agree about the corruption dangers. But just to follow
> the train of thought a little farther:
> 
> Would you be able to solve the above corruption by keeping a metafile
> containing lists of files that still need to be deleted? Then, assuming
> the check phase occurs after writing, darcs would be able to recognize
> and recover from any system crash.

Something like that would be possible, except that it would need to include
not only files to be deleted, but also files to be renamed, since we'd need
to perform renames while doing the checks.  This might not be so hard to
implement, as we already have a format for lists of changes to be made to a
filesystem.  We could simply write it in patch format, and then use the
apply_patch method to undo it...

For the moment, I figure a good enough solution is just trusting that
people won't do something stupid with their filenames, and trying to make
it hard for them to accidentally do so.  Catching all errors properly would
definitely be nice, but would also be a lot of work.

> The potential justification for this is that it's an easy way to deal with
> many different filesystems. The alternative is to have darcs understand
> the limitations of all filesystems. That seems like a big can o' worms,
> so if there's a way to just identify when a violation has occurred, while
> protecting the repo from corruption, it might be a good thing, at least until
> some insane hacker dude decides to add darcs support for tons of filesystems.

Well, you've left out another alternative, which is just to not worry
overly much about it (and hope for the best), which is what darcs does
already.  But I agree that optimistic programming isn't generally a good
thing, and your idea as to how to check for filename problems sounds like
the way to go.

> > > OK, let's say that files that existed once-upon-a-time in the
> > > repository are called old files. Do old files still have an impact on
> > > the actual files in a current repository? i.e. will an old file cause
> > > problems because of files that actually exist in a current
> > > repository, or because of problems that would only exist if someone
> > > tried to recreate the earlier version of the repository that
> > > contained the old file?
> > 
> > It would mean that darcs get couldn't retrieve that repository on any
> > platform that has the problem with an old filename.  This could be
> > worked around by writing a version of get that doesn't require that the
> > repository be consistent.
> 
> But aside from that workaround, *why* would the problem occur? I'm
> confused. If the old file only existed as part of the history of the
> project's development, but not as any file actually on disk, would it
> still be impossible to retrieve the repository on that platform? And if
> so, why?

Because darcs only reads patches (and the inventory, which is lists of
patches) remotely, never files that are in the repo.  So a get involves
fetching all the patches and then applying them all to an empty
repository.  To read the content files would require that I know how to
access them, which would involve understanding the escaping of spaces and
weird characters in URLs, plus that would be a pain because I'd have to
know if I'm looking at a URL or a file when creating repo strings to be
fetched.  So I only fetch patches (which I'm sure you've noticed have
pre-escaped filenames, for precisely this reason).
-- 
David Roundy
http://www.abridgegame.org




More information about the darcs-users mailing list