-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error processing archives with non-english characters in the names of archived files/folders #114
Comments
and this sample (created by far2l) just crashes far2l 2g0.ru/test.7z |
same problem then trying to extract a folder with english name from zip archive. (try to browse inside archive and copy the folder "test" anywhere) |
found the source of the last problem. see:
see #121 |
about Desktop.zip - I see filenames stored in 866 code page in archive created on widows. is there any way to parse whose archives correctly? unzip itself correctly unpacks whose in utf-8 environment |
guess the file list code page may be detected from ZipOS variable, see multiarc/formats/zip/zip.cpp line 289 |
fixed this with some shit and sticks using wine code. unfortunately trying to push returns 403, so here is modified zip.cpp at least it works ok with my test cases used some wine's code from here: https://github.com/wine-mirror/wine/blob/master/dlls/user32/lstr.c |
btw, ANSI->OEM conversion may still be required with ZipHeader.PackVer>20 && ZipHeader.PackVer<25 |
fixed, see version included. |
more testing shows far crashing on 7z archives without any non-empty files commenting see #120 |
moved some issues from here to separate tickets. they seem to be non-charset releated. |
upd: the method of encoding detection I used fails on some utf8 archves created on windows. example: maybe utf-8 encoded file name extra field from file header may be used to detect such cases, but multiarc does not currently support it. |
retested with master build. still see far showing this archive stores file names two times: as native zip format suggests (but in utf8 form that is uncommon for windows archivers afaik; but, as archive is created on windows, my code assumes it has OEM charset) and in the other field also in utf-8 form as suggested for storing unicode file names in never versions of format (but our current code ignores this field). as far lists archive content differently, resulting unzip command becomes incorrect so files can not be extracted. |
yep I've just did not notice difference from 1st view.. |
expected behavour for multiarc is to look inside utf8 extended header field, and, if it is not present or empty, fall back to logic we currently have. |
btw, see https://github.com/elfmz/far2l/files/499990/zip.cpp.zip |
sample archive (created on linux) included. listed ok, but unpacking fails.
the other sample archive (created on windows) lists with garbadge in file names, and also does not unpacks.
The text was updated successfully, but these errors were encountered: