error processing archives with non-english characters in the names of archived files/folders #114

unxed · 2016-09-28T15:22:52Z

sample archive (created on linux) included. listed ok, but unpacking fails.

the other sample archive (created on windows) lists with garbadge in file names, and also does not unpacks.

unxed · 2016-09-28T15:23:04Z

list_ok_unpack_fails.zip

unxed · 2016-09-28T15:42:30Z

Desktop.zip

unxed · 2016-09-28T16:10:25Z

and this sample (created by far2l) just crashes far2l

2g0.ru/test.7z

unxed · 2016-09-28T19:06:47Z

same problem then trying to extract a folder with english name from zip archive.
guess there is two independent problems.
test.zip

(try to browse inside archive and copy the folder "test" anywhere)

unxed · 2016-09-28T19:08:46Z

found the source of the last problem. see:

/home/unxed$ unzip -o  /home/unxed/Downloads/list_ok_unpack_fails.zip "проверка/*.*" -d . 
TIP: If you feel stuck - use Ctrl+Alt+C to terminate everything in this shell.            
Archive:  /home/unxed/Downloads/list_ok_unpack_fails.zip                                  
caution: filename not matched:  проверка/*.*                                              
/home/unxed$ unzip -o  /home/unxed/Downloads/list_ok_unpack_fails.zip "проверка/*" -d .   
Archive:  /home/unxed/Downloads/list_ok_unpack_fails.zip

*.* does not mean "any file" on linux, so unzip can not find anything matching *.* in empty folder and skips extracting it.

*.* should be replaced by * on linux I guess

see #121

unxed · 2016-09-28T19:16:44Z

about Desktop.zip - I see filenames stored in 866 code page in archive created on widows. is there any way to parse whose archives correctly? unzip itself correctly unpacks whose in utf-8 environment

unxed · 2016-09-28T19:42:22Z

guess the file list code page may be detected from ZipOS variable, see multiarc/formats/zip/zip.cpp line 289

unxed · 2016-09-28T21:54:15Z

zip.cpp.zip

fixed this with some shit and sticks using wine code. unfortunately trying to push returns 403, so here is modified zip.cpp

at least it works ok with my test cases

used some wine's code from here: https://github.com/wine-mirror/wine/blob/master/dlls/user32/lstr.c

unxed · 2016-09-29T00:18:04Z

btw, ANSI->OEM conversion may still be required with ZipHeader.PackVer>20 && ZipHeader.PackVer<25

unxed · 2016-09-29T07:31:10Z

fixed, see version included.
zip.cpp.zip

unxed · 2016-09-29T14:32:41Z

and this sample (created by far2l) just crashes far2l
2g0.ru/test.7z

more testing shows far crashing on 7z archives without any non-empty files

commenting
Item->PackSizeHigh = packed_size & 0xffffffff;
Item->PackSize = (packed_size >> 32) & 0xffffffff;
in 7z.cpp fixes this behavour

see #120

unxed · 2016-09-29T14:42:27Z

moved some issues from here to separate tickets. they seem to be non-charset releated.

unxed · 2016-09-29T18:38:54Z

upd: the method of encoding detection I used fails on some utf8 archves created on windows. example:
23-10-2012-b-fasi-eaep.zip

maybe utf-8 encoded file name extra field from file header may be used to detect such cases, but multiarc does not currently support it.
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375)

unxed · 2016-09-29T21:33:17Z

retested with master build. still see far showing
Б' ФАСЖ ПД06 СХОКДИА ДАДП (ИМТ).xls
inside archive instead of
Β' ΦΑΣΗ ΠΕ06 ΣΧΟΛΕΙΑ ΕΑΕΠ (ΙΝΤ).xls
as it should.

this archive stores file names two times: as native zip format suggests (but in utf8 form that is uncommon for windows archivers afaik; but, as archive is created on windows, my code assumes it has OEM charset) and in the other field also in utf-8 form as suggested for storing unicode file names in never versions of format (but our current code ignores this field).

as far lists archive content differently, resulting unzip command becomes incorrect so files can not be extracted.

elfmz · 2016-09-29T21:37:07Z

yep I've just did not notice difference from 1st view..

unxed · 2016-09-29T21:37:36Z

expected behavour for multiarc is to look inside utf8 extended header field, and, if it is not present or empty, fall back to logic we currently have.

unxed · 2016-09-29T21:40:51Z

btw, see https://github.com/elfmz/far2l/files/499990/zip.cpp.zip
it is updated against master but adds some intellegence from original code (to get round some older windows zip implementations which wrote file names in ANSI charset, as I can guess)

unxed · 2016-09-29T22:00:06Z

Saw ANSI code merged, thanks. Btw, this ticket had become too complicated and hard to read or understand. #122 for the remaining issue. Closing.

unxed changed the title ~~error porcessing archives with non-english characters in the names of archived files/folders~~ error processing archives with non-english characters in the names of archived files/folders Sep 29, 2016

unxed closed this as completed Sep 29, 2016

unxed mentioned this issue May 18, 2024

Plugins: ArcLite, Hash/CRC calculator #917

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error processing archives with non-english characters in the names of archived files/folders #114

error processing archives with non-english characters in the names of archived files/folders #114

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

elfmz commented Sep 29, 2016

unxed commented Sep 29, 2016

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

error processing archives with non-english characters in the names of archived files/folders #114

error processing archives with non-english characters in the names of archived files/folders #114

Comments

unxed commented Sep 28, 2016 • edited Loading

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016 • edited Loading

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016 • edited Loading

unxed commented Sep 28, 2016

unxed commented Sep 28, 2016 • edited Loading

unxed commented Sep 28, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

elfmz commented Sep 29, 2016

unxed commented Sep 29, 2016

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 29, 2016 • edited Loading

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 28, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading

unxed commented Sep 29, 2016 •

edited

Loading