aMule Forum
English => en_Bugs => Topic started by: Gerd78 on November 11, 2005, 08:26:29 PM
-
Hi,
the man pages are encoded inconsistently. Most of them are in ISO-8859-1, but the following German ones are in UTF-8:
src/utils/aLinkCreator/docs/alc.de.1
src/utils/aLinkCreator/docs/alcc.de.1
docs/man/amule.de.1
docs/man/amulecmd.de.1
docs/man/amuleweb.de.1
src/utils/cas/docs/cas.de.1
The other German ones are also in ISO-8859-1.
I don't know anything about the distros' patchery, but the original vanilla man pager doesn't even pretend to support anything else than ISO-8859-1, even in UTF-8 locales. But even if UTF-8 were correct, it should be consistens in all man pages and languages within the aMule package.
-
That depends on who created that page, and which editor he used.
Are they useable and that's a nice to know report, or do you get problems because this issue?
-
Originally posted by Vollstrecker
Are they useable and that's a nice to know report, or do you get problems because this issue?
The latter.
Good man page in ISO-8859-1 (src/utils/wxCas/docs/wxcas.de.1):
(http://img396.imageshack.us/img396/5145/gut8my.png)
Bad man page in UTF-8 (src/utils/cas/docs/cas.de.1)
(http://img396.imageshack.us/img396/8970/schlecht8yt.png)
Please note that my system is set to LANG=de_DE.UTF-8. But this doesn't help with UTF-8 man pages because the program "/usr/bin/man" itself doesn't understand UTF-8 at all.
Man pages should be in the locale's native character set (i.e. ISO-8859-1 for English, French, German, Spanish, Hungarian, as it is the case in all other man pages than the German ones, and even in some of the German ones, but not all).
"man" will support UTF-8 in the future, but it's not yet decided how to do it without breaking existing packages. One approach is a special tag for UTF-8 in the man pages itself, and another approach is using "/usr/share/man/de.UTF-8/man1" instead of "/usr/share/man/de/man1", but neither of them is implemented right now.
If some distros accept UTF-8 man pages, first it's still non-standard and second, it means that they probably accept UTF-8 man pages only, which means that the man pages inside the aMule package should at least all have the same encoding.
"recode" or "iconv" can be used to convert text files.
-
This is a patch for today's CVS that converts all man pages to ISO-8859-1.
-
Originally posted by Gerd78
Man pages should be in the locale's native character set (i.e. ISO-8859-1 for English, French, German, Spanish, Hungarian, as it is the case in all other man pages than the German ones, and even in some of the German ones, but not all).
I'm not familiar with man or the creation of man pages, but it seems wrong to me that ISO-8859-1 should be the native character set since it doesn't even include the symbol for the currency we're using for nearly 4 years now...
-
Good point, but how many man-pages have you seen, that use that symbol? I can't even rember only one.
-
If vanilla man pages do not support UTF-8, then that sounds enough as an argument to take UTF-8 away from our man apges and port them to 8859-1.
-
Originally posted by thedude0001
I'm not familiar with man or the creation of man pages, but it seems wrong to me that ISO-8859-1 should be the native character set since it doesn't even include the symbol for the currency we're using for nearly 4 years now...
That's correct and that's why I was wrong: It must be ISO-8859-15, not ISO-8859-1. But it doesn't matter that much because in the above-mentioned man pages only the common subset of both is used anyway.
-
no sorry, no iso whatsoever!
I made them utf-8 on perspose, since its the most standart on most distros anyways
also its more commen to all systems all over the world. also all po strings are utf8 aswell.
so if you want, you can edit the rest to utf8, not "fix" the right once
-
There are manpages in a lot of encodings. "man" is not enforced to iso8859-1, or japanese people will be really fucked up, if you know what I mean.
-
Originally posted by stefanero
no sorry, no iso whatsoever!
I made them utf-8 on perspose, since its the most standart on most distros anyways
Why? Where do you know from that this is the correct thing to do?
UTF-8 is indeed the most widely used character set in Linux distros today, but not for man pages. As you can see above, umlauts in UTF-8 man pages are garbage in an UTF-8 environment, whereas and ISO-8859-1 man pages work fine. Man pages are not locale specific.
Using an UTF-8 locale e.g. means that the file system is in UTF-8, but not that every file needs to be converted to UTF-8. Example: .desktop files always have to be UTF-8 in any(!) locale and European man pages always have to be ISO-8859-1 in any(!) locale, whereas HTML can be in any encoding because the encoding is stored in the documents themselves.
Originally posted by stefanero
also its more commen to all systems all over the world. also all po strings are utf8 aswell.
Man pages have nothing to do with .po files. That's a completely different thing. There it's correct to use UTF-8.
Originally posted by stefanero
so if you want, you can edit the rest to utf8, not "fix" the right once
Please read the following thread:
http://mail.nl.linux.org/linux-utf8/2005-07/msg00004.html
Especially the following post:
http://mail.nl.linux.org/linux-utf8/2005-07/msg00009.html
Originally posted by Kry
There are manpages in a lot of encodings. "man" is not enforced to iso8859-1, or japanese people will be really fucked up, if you know what I mean.
See here:
http://mail.nl.linux.org/linux-utf8/2005-07/msg00009.html
Japanese man pages are not UTF-8 either. They are EUC-JP. Japanese man pages in UTF-8 are unreadable in any locale, because /usr/bin/man expects EUC-JP input and treats every input as EUC-JP, even if it's UTF-8. The result is garbage.
If you misunderstood what I said: man is indeed not limited to ISO-8859-1, but this doesn't mean that it understands UTF-8. It doesn't. It understands only one specific character set per language, and this is accidentally ISO-8859-1 for all languages you have man pages for.
If you read the linked thread carefully, you will see that the linux man page maintainers want to support UTF-8 man pages in the future, but they don't even know how to do it. Probably they will do it by introducing a special tag in UTF-8 man pages like in HTML so that the man pager knows how to handle them, but currently it doesn't.
BTW it's not that I would "have to" convince anyone. It's no problem for me to convert them myself.
-
Originally posted by Vollstrecker
Good point, but how many man-pages have you seen, that use that symbol? I can't even rember only one.
Actually there are some more differences between ISO-8859-1 and -15, check wikipedia (http://de.wikipedia.org/wiki/ISO-8859-1#8859-1_vs._-15_vs._Windows-1252). Chances are that most of these are not very common in manpages, but some might actually appear...
-
Sure, but if someone claims about missing currency-sign, I say it isn't used /at least afaik). The other diffs hadn't been asked.
-
Check "man iso_8859-15" (not the content, but the file itself). The file is /usr/share/man/man7/iso_8859-15.7.gz or similar. It's in ISO-8859-15 and works as expected.
By the way, to eliminate any misunderstandings: This is not an accusation or something like that, just a suggestion for a tiny problem. Broken umlauts are not nice, but they won't hurt anyone.
-
hmm you only have brocken umlauts on the console if your console does not support utf8....
I can read the manpages here just fine
-
Originally posted by stefanero
hmm you only have brocken umlauts on the console if your console does not support utf8....
My console supports UTF-8. UTF-8 plain text files are OK with "less" and "cat". But man pages are not plain text files...
Originally posted by stefanero
I can read the manpages here just fine
Which distro are you using? Can you read UTF-8 man pages only or do both work for you?