[darcs-users] about non-ASCII filenames in patches (more work needed)
Eric Kow
kowey at darcs.net
Mon May 31 17:56:42 UTC 2010
On Sat, May 29, 2010 at 17:44:03 +0100, Eric Kow wrote:
> Here's an example changelog extending the issue1763 test. Note the
> discrepancy between the conflictor representation.
OK! So I was getting very worried that this was actually affecting the
entirety of darcs-2 patches, but Petr has managed to re-assure me on IRC
(mostly by pointing out where I'd gotten confused and gotten things
backwards).
http://irclog.perlgeek.de/darcs/2010-05-31#i_2387317
But, we do still need to finish this job. The damage is there, but it's fairly
localised.
> Sat May 29 17:33:45 BST 2010 Eric Kow <E.Y.Kow at brighton.ac.uk>
> conflictor [
> hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 2
> +hello
> hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 3
> +hello
> ]
> |:
> hunk ./kit<U+00F6>lt<U+00E9>s.lisp 2
> +hi
I got mislead by this output. It's important not to forget that the darcs-2
format treats filenames as sequences of bytes. This contradicts one of my
earlier self-corrections; we're using String as the internal representation but
they're only proper Unicode code points in the darcs-1 format whereas in the
darcs-2 each Char is just a really expensive padded byte.
Reinier: we should perhaps change this notation again not to imply that we have
Unicode code points up there :-/
Anyway, in the output above, this is correct
./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp
Because it corresponds to the UTF-8 sequence
6b k
69 i
74 t
c3b6 ö
6c l
74 t
c3a9 é
73 s
So all of the other patches below seem to be right (phew!).
On the *other* hand, this change output is incorrect
hunk ./kit<U+00F6>lt<U+00E9>s.lisp 2
because it means that somewhere in the Darcs code, we're *reading* the
filename in as UTF-8. I think I've found the culprit (readNon) and
will be submitting a patch later if I'm right about it.
ALSO: this doesn't change the fact that there may still be bits of darcs
2 code that's wrongly trying to encode the output in UTF-8 (in addition
to the read errors above).
Phew!
Thanks, Petr... (I hope I'm not still confused).
> Sat May 29 17:33:45 BST 2010 Eric Kow <E.Y.Kow at brighton.ac.uk>
> * My continuation of the conflict
> hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 3
> +hello
> addfile ./non-kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp
>
> Sat May 29 17:33:45 BST 2010 Eric Kow <E.Y.Kow at brighton.ac.uk>
> * My conflicting edit
> hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 2
> +hello
>
> Sat May 29 17:33:45 BST 2010 Eric Kow <E.Y.Kow at brighton.ac.uk>
> * First edit
> hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 1
> +hi
>
> Sat May 29 17:33:45 BST 2010 Eric Kow <E.Y.Kow at brighton.ac.uk>
> * Add
> addfile ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp
--
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100531/40096f5e/attachment.pgp>
More information about the darcs-users
mailing list