[darcs-users] Re: Whitespace in filenames
Zack Brown
zbrown at tumblerings.org
Mon Aug 4 14:41:22 UTC 2003
On Mon, Aug 04, 2003 at 06:44:49AM -0400, David Roundy wrote:
> On Sun, Aug 03, 2003 at 02:18:52PM -0700, Zack Brown wrote:
>
> For the moment, I figure a good enough solution is just trusting that
> people won't do something stupid with their filenames, and trying to make
> it hard for them to accidentally do so. Catching all errors properly would
> definitely be nice, but would also be a lot of work.
>
> > The potential justification for this is that it's an easy way to deal with
> > many different filesystems. The alternative is to have darcs understand
> > the limitations of all filesystems. That seems like a big can o' worms,
> > so if there's a way to just identify when a violation has occurred, while
> > protecting the repo from corruption, it might be a good thing, at least until
> > some insane hacker dude decides to add darcs support for tons of filesystems.
>
> Well, you've left out another alternative, which is just to not worry
> overly much about it (and hope for the best), which is what darcs does
> already. But I agree that optimistic programming isn't generally a good
> thing, and your idea as to how to check for filename problems sounds like
> the way to go.
OK, cool, we're just laying the groundwork for future masochists. ;-)
> > If the old file only existed as part of the history of the
> > project's development, but not as any file actually on disk, would it
> > still be impossible to retrieve the repository on that platform? And if
> > so, why?
>
> Because darcs only reads patches (and the inventory, which is lists of
> patches) remotely, never files that are in the repo. So a get involves
> fetching all the patches and then applying them all to an empty
> repository.
Ah ha! That's what I missed. I knew darcs behaved that way, but I just
didn't put 2 and 2 together.
> To read the content files would require that I know how to
> access them, which would involve understanding the escaping of spaces and
> weird characters in URLs, plus that would be a pain because I'd have to
> know if I'm looking at a URL or a file when creating repo strings to be
> fetched. So I only fetch patches (which I'm sure you've noticed have
> pre-escaped filenames, for precisely this reason).
I'll just take a shot in the dark (ouch ;-), and suggest this:
Each file could have its "real" name, which darcs would be aware of in the
same way it is now. But the filename created on disk would be derived by
applying rules to the real name. These rules would be based on the filesystem
involved. Adding support for a new filesystem would simply involve adding
another rule-set to convert real filenames into filesystem-specific filenames
(and back again for 'record' and 'push'). Users would specify their filesystem
as a command-line argument where necessary.
I think this would allow anyone using any filesystem, to participate in any
project regardless of filenames. Someone on a FAT system would be able to
pull a repository and work on it, even if the repository had tons of long
filenames and spaces. When they did a 'record' or a 'push', darcs would do a
reverse translation back to the original filename. And if they did an 'add'
or a 'mv', they could simply specify a target "real" filename (i.e. with no
naming restrictions), and darcs would do its translation to produce a file on
their disk that they could actually use. Other people on 'better' filesystems,
would 'pull' that person's changes, and see a file with the "real" filename
(or with darcs' translation according to the needed restrictions). Meanwhile,
darcs' rule-sets would be sophisticated enough to avoid collisions wherever
possible (see below).
Even ext2 has filename restrictions, and could have its own rule set, while
darcs' "real" names could be even less restrictive than that.
Coupled with the write-then-test idea, this could work even better. When
the user specifies which filesystem rule-set to use, there is no need to
test file creation for errors, because the rule-set would guarantee a proper
filename. But if no rule-set is specified, darcs would do write-then-test,
and therefore *still* not gum anything up. It could even ask the user to
provide her own "temporary" filename, as if the user herself were the rule-set.
The only problem I see with this whole approach (aside from any difficulty
of implementation) is that some situations will defeat the process. FAT
filesystems, with their 8x3 naming mechanism, just can't hold the same
variety of filenames as other filesystems; and even with collision detection
and avoidance built into the rule-sets, there would still be some situations
where no rule could produce a valid filename on the target system. There's
no sane way out of this, I believe, but it would also be a situation that
would be documentable, extremely rare, one that would only occur under insane
circumstances, and one that darcs could recognize and give a warning about.
Be well,
Zack
> --
> David Roundy
> http://www.abridgegame.org
>
> _______________________________________________
> darcs-users mailing list
> darcs-users at abridgegame.org
> http://www.abridgegame.org/mailman/listinfo/darcs-users
--
Zack Brown
More information about the darcs-users
mailing list