[darcs-users] about non-ASCII filenames in patches (more work needed)

Eric Kow kowey at darcs.net
Mon May 31 17:56:42 UTC 2010


On Sat, May 29, 2010 at 17:44:03 +0100, Eric Kow wrote:
> Here's an example changelog extending the issue1763 test.  Note the
> discrepancy between the conflictor representation.

OK!  So I was getting very worried that this was actually affecting the
entirety of darcs-2 patches, but Petr has managed to re-assure me on IRC
(mostly by pointing out where I'd gotten confused and gotten things
backwards).

  http://irclog.perlgeek.de/darcs/2010-05-31#i_2387317

But, we do still need to finish this job.  The damage is there, but it's fairly
localised.

> Sat May 29 17:33:45 BST 2010  Eric Kow <E.Y.Kow at brighton.ac.uk>
>     conflictor [
>     hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 2
>     +hello
>     hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 3
>     +hello
>     ]
>     |:
>     hunk ./kit<U+00F6>lt<U+00E9>s.lisp 2
>     +hi

I got mislead by this output.  It's important not to forget that the darcs-2
format treats filenames as sequences of bytes.  This contradicts one of my
earlier self-corrections; we're using String as the internal representation but
they're only proper Unicode code points in the darcs-1 format whereas in the
darcs-2 each Char is just a really expensive padded byte.

Reinier: we should perhaps change this notation again not to imply that we have
Unicode code points up there :-/

Anyway, in the output above, this is correct

  ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp

Because it corresponds to the UTF-8 sequence

  6b     k
  69     i
  74     t
  c3b6   ö
  6c     l
  74     t
  c3a9   é
  73     s

So all of the other patches below seem to be right (phew!).

On the *other* hand, this change output is incorrect

   hunk ./kit<U+00F6>lt<U+00E9>s.lisp 2

because it means that somewhere in the Darcs code, we're *reading* the
filename in as UTF-8.  I think I've found the culprit (readNon) and
will be submitting a patch later if I'm right about it.

ALSO: this doesn't change the fact that there may still be bits of darcs
2 code that's wrongly trying to encode the output in UTF-8 (in addition
to the read errors above).

Phew!

Thanks, Petr... (I hope I'm not still confused).

> Sat May 29 17:33:45 BST 2010  Eric Kow <E.Y.Kow at brighton.ac.uk>
>   * My continuation of the conflict
>     hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 3
>     +hello
>     addfile ./non-kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp
> 
> Sat May 29 17:33:45 BST 2010  Eric Kow <E.Y.Kow at brighton.ac.uk>
>   * My conflicting edit
>     hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 2
>     +hello
> 
> Sat May 29 17:33:45 BST 2010  Eric Kow <E.Y.Kow at brighton.ac.uk>
>   * First edit
>     hunk ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp 1
>     +hi
> 
> Sat May 29 17:33:45 BST 2010  Eric Kow <E.Y.Kow at brighton.ac.uk>
>   * Add
>     addfile ./kit<U+00C3><U+00B6>lt<U+00C3><U+00A9>s.lisp

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20100531/40096f5e/attachment.pgp>


More information about the darcs-users mailing list