Folks,
I am sorry, I fogot an important detail. It's been quite a while since I was involved with this last time.
The file name conversion routines are, respectively, for #1 MB2WC() and for #2 WC2MB(). The tricky part is that in an ideal world, WC2MB( MB2WC( s ) ) == s, i.e., converting to UNICODE and then back to multibyte, should be an invariant. If you read my previous explanation, you will notice that this is not the case. So, what are the options:
1) Leave the code as it is. This means that the code will fail if there is an UTF-8 file name in your system that can be converted to ISO-8859-1.
aMule will report something like this for a UTF-8 file name containing a modified letter " u + ' ":
2007-12-29 17:14:02: Logger.cpp(268): Error: Failed to retrieve file times for '/home/myuser/.aMule/Incoming/filmes/asdú.mp4' (error 2: No such file or directory)
2007-12-29 17:14:02: FileFunctions.cpp(187): FileIO: Error on GetLastModificationTime from `/home/myuser/.aMule/Incoming/filmes/asdú.mp4'
2007-12-29 17:14:02: CFile.cpp(135): CFile: Error when opening file (/home/myuser/.aMule/Incoming/filmes/asdú.mp4): No such file or directory
This is pretty bad IMHO, because a UTF-8 configured system will fail out of no apparent reason.
2) Change #1 and #2 so that we only recognize UTF-8 valid names. This means that the app will fail to read non-UTF-8 file names. The fact that your system is configured to use UTF-8 is irrelevant here, an application can always save a file name such that it is an invalid UTF-8 sequence. ISO-8859-1 file names would not be able to be shared. This is bad, but maybe not so much, non-UTF-8 systems are starting to get rare.
3) Leave #1 as it is, so that we are always able to read a file name from the system , and change #2 to convert UNICODE file name always to UTF-8. This would also break things for ISO-8859-1 names, aMule will not be able to share these names because the invariant is broken. UTF-8 names will work fine. I see no big advantage over the previous choice and maybe we are just postponing an error that should be caught sooner.
The big problem is that once the file name is converted to UNICODE by WC2MB(), there is no way for us to know if the original encoding was UTF-8 or ISO-8859, and we need this information to satisfy the invariance relation. The proper solution IMHO would be to patch wxWidgets so that it would remember the original file name string or the original encoding, somewhere in its internal file structure.
I would really appreciate oppinions here, this issue is bugging us for too much time, and all the cards are on the table now. I need help from the other people in the project as well as anyone wishing to contribute. My vote is for solution number 2, but we must be conscient that this will break sharing for all non-UTF-8 names.
Hope to hear from lots of people.
EDIT: Another possibility (#4) would be not to use wxStrings to store file names. I don't know if this is possible given that wx file functions expect wxStrings.