[darcs-users] Latin vs. Unicode

Ben Franksen ben.franksen at online.de
Sun Nov 16 01:40:02 UTC 2014


This came up when re-factoring the options system and is of wider interest, 
I think, so I send it to darcs-users.

The issue is, I should say, limited to stuff we get from the command line, 
or from the environment, that is, patch meta-data like author, patch name, 
etc. Here, Darcs has currently built in extra support for handling 8-bit 
encodings like iso latin1. This works by casting the unicode characters in 
the Strings to Word8, which effectively calculates their value modulo 256. 
This is not noticeable as long as you use only languages with characters 
whose code points are below 256, which is the case for most European 
languages; but for Asian ones, not to speak of the other continents, this 
breaks as soon as they enter data in their native languages.

Over the last years, unicode has established itself world-wide and firmly 
and is well supported by all the major operating systems. This is why I vote 
for dropping support for older 8-bit encodings that are not unicode 
compatible, thereby allowing e.g. Chinese users to use Darcs with their 
native languages.

Cheers
Ben
-- 
"Make it so they have to reboot after every typo." -- Scott Adams




More information about the darcs-users mailing list