[darcs-users] Re: Whitespace in filenames

David Roundy droundy at abridgegame.org
Sun Aug 3 19:10:43 UTC 2003


On Sun, Aug 03, 2003 at 11:40:08AM -0700, Zack Brown wrote:
> On Sat, Aug 02, 2003 at 06:12:59AM -0400, David Roundy wrote:
> > Well, there's a problem in that I don't know that it's possible or
> > practical to check against the conventions of the filesystem you're pulling
> > to.  Among other things, the repository might span more than one
> > filesystem, each of which has different filename restrictions.  And on the
> > principle of letting users do whatever they want, that should be ok.  So I
> > don't see myself trying to figure out what the filename restrictions of a
> > given filesystem are to check on pull.  Patches to do this would be
> > welcome, but it doesn't interest me, and I don't see how it can guarantee
> > that the patch will apply properly (that is, that it will write to the
> > desired file).
> 
> How about this: try to create the file, then check to see if it was
> properly created. If not, you know there's an incompatibility, and you
> can punt to the user. That way you don't need to know exactly which
> filesystem you're dealing with, but you still catch all violations.

The problem is that you can only do that by actually modifying the
directory, which is well after I'd like to do my checking.  I like being
able to do my check phase before touching the repo, so if there is a
problem there is no chance of the repo being corrupted.  If you create
files in the repo while still checking, then if darcs crashes (the power
dies, or whatever) before you get a chance to delete the files, you've got
a corrupt repo.  It would be nice to keep the window of potential
corruption as small as possible.  Also, you'd have to create all the
potential test files before deleting any of them, since two files in the
same patch may conflict with one another.

> > The other problem is that for a repository to be portable, no file can
> > *ever* have had a filename portability problem.  Normally when a project
> > has a filename policy, I would think that they would enforce it by renaming
> > any files that violate it, but this wouldn't be enough to acheive
> > portability.  The patches would need to be unrecorded, which, for example,
> > the darcs-patcher won't do--mostly to avoid race conditions with people
> > pulling while you unrecord, but also because I think unrecording publicly
> > available patches is bad policy.  Anyhow, the point is recording a
> > "non-portable" patch is a very serious issue--if it goes unnoticed, and
> > later the repo is intended to be used on another platform, there are only
> > two options: either start a new repository or manual excise all mention of
> > the guilty file from the repository history.  If it's a directory, it's
> > even worse.  I had to do this myself with darcs, when I had two files named
> > diff.lhs and Diff.lhs (you can still see, like fossils, mention of diff.lhs
> > in Makefiles of old patches).
> 
> OK, let's say that files that existed once-upon-a-time in the repository
> are called old files. Do old files still have an impact on the actual
> files in a current repository? i.e. will an old file cause problems
> because of files that actually exist in a current repository, or because
> of problems that would only exist if someone tried to recreate the
> earlier version of the repository that contained the old file?

It would mean that darcs get couldn't retrieve that repository on any
platform that has the problem with an old filename.  This could be worked
around by writing a version of get that doesn't require that the repository
be consistent.

One way to do this would be to implement a sort of checkpointing-like
scheme in which we would store (optionally and perhaps only occasionally)
"snapshots" of the repository at tags.  This would allow doing a darcs get
of without downloading the entire repository history, which would be nice,
since I don't care for the idea that get is an O(n) process where n is the
age of the repository.  Actually, I think that it is likely that eventually
I'll get around to implementing this idea, which could be used to
effectively throw away old repository history in a nice controlled manner.

I had originally been thinking that I'd want to implement this by storing a
snapshot tarball, but after my recent testing with large repos I think I
can store the snapshot as one big patch, which is much nicer, since it
means that it can store any data that darcs is made to support (and no
more--which is as it should be).  I've been looking into using zlib to
compress patches, which would make the storing and transferring of large
snapshot patches somewhat less painful...

But for the moment, the painful situation is that a repository with an
invalid filename anywhere in its history cannot effectively be used.  You
could hack around this, for example by copying the repository manually,
but then you wouldn't be able to use check to see if you had done so
correctly (since you'd still have a corrupt repo).
-- 
David Roundy
http://www.abridgegame.org




More information about the darcs-users mailing list