Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF16: reading file is fine, but writing file is garbage #6542

Closed
Alexis-Lapierre opened this issue Apr 1, 2023 · 0 comments · Fixed by #6497
Closed

UTF16: reading file is fine, but writing file is garbage #6542

Alexis-Lapierre opened this issue Apr 1, 2023 · 0 comments · Fixed by #6497
Labels
C-bug Category: This is a bug

Comments

@Alexis-Lapierre
Copy link
Contributor

Alexis-Lapierre commented Apr 1, 2023

Summary

When reading from UTF16 file, helix decode it correctly, but when writing back the file on disk, it write back garbage data.

Reproduction Steps

I tried this:

  1. wget https://raw.githubusercontent.com/stain/encoding-test-files/master/utf16.txt - download a sample UTF16 file inside this project
  2. file utf16.txt - print utf16.txt: Unicode text, UTF-16, little-endian text
  3. hx
  4. In helix, type :x
  5. file utf16.txt - now print utf16.txt: ISO-8859 text ⚠️

I expected this to happen:

helix to convert the file to UTF8 - or keep UTF16 little endian

Instead, this happened:

helix written something else completely - ASCII character survive the change of encoding but any character outside ASCII is not Unicode characters.

File content

hexdump before file edit
00000000  ff fe 70 00 72 00 65 00  6d 00 69 00 e8 00 72 00  |..p.r.e.m.i...r.|
00000010  65 00 20 00 69 00 73 00  20 00 66 00 69 00 72 00  |e. .i.s. .f.i.r.|
00000020  73 00 74 00 0a 00 70 00  72 00 65 00 6d 00 69 00  |s.t...p.r.e.m.i.|
00000030  65 00 00 03 72 00 65 00  20 00 69 00 73 00 20 00  |e...r.e. .i.s. .|
00000040  73 00 6c 00 69 00 67 00  68 00 74 00 6c 00 79 00  |s.l.i.g.h.t.l.y.|
00000050  20 00 64 00 69 00 66 00  66 00 65 00 72 00 65 00  | .d.i.f.f.e.r.e.|
00000060  6e 00 74 00 0a 00 1a 04  38 04 40 04 38 04 3b 04  |[email protected].;.|
00000070  3b 04 38 04 46 04 30 04  20 00 69 00 73 00 20 00  |;.8.F.0. .i.s. .|
00000080  43 00 79 00 72 00 69 00  6c 00 6c 00 69 00 63 00  |C.y.r.i.l.l.i.c.|
00000090  0a 00 01 d8 00 dc 20 00  61 00 6d 00 20 00 44 00  |...... .a.m. .D.|
000000a0  65 00 73 00 65 00 72 00  65 00 74 00 0a 00        |e.s.e.r.e.t...|
000000ae
hexdump after file edit
00000000  70 72 65 6d 69 e8 72 65  20 69 73 20 66 69 72 73  |premi.re is firs|
00000010  74 0a 70 72 65 6d 69 65  26 23 37 36 38 3b 72 65  |t.première|
00000020  20 69 73 20 73 6c 69 67  68 74 6c 79 20 64 69 66  | is slightly dif|
00000030  66 65 72 65 6e 74 0a 26  23 31 30 35 30 3b 26 23  |ferent.К&#|
00000040  31 30 38 30 3b 26 23 31  30 38 38 3b 26 23 31 30  |1080;р&#10|
00000050  38 30 3b 26 23 31 30 38  33 3b 26 23 31 30 38 33  |80;л&#1083|
00000060  3b 26 23 31 30 38 30 3b  26 23 31 30 39 34 3b 26  |;иц&|
00000070  23 31 30 37 32 3b 20 69  73 20 43 79 72 69 6c 6c  |#1072; is Cyrill|
00000080  69 63 0a 26 23 36 36 35  36 30 3b 20 61 6d 20 44  |ic.𐐀 am D|
00000090  65 73 65 72 65 74 0a                              |eseret.|
00000097

Helix log

~/.cache/helix/helix.log
2023-04-01T21:43:04.411 helix_view::clipboard [DEBUG] Using wl-copy+wl-paste to interact with the system and selection (primary) clipboard
2023-04-01T21:43:04.412 helix_vcs [ERROR] Error {
    context: "failed to open git repo",
    source: Discover(
        NoGitRepositoryWithinFs {
            path: "/tmp",
            limit: "/",
        },
    ),
}
2023-04-01T21:43:04.412 helix_vcs [ERROR] failed to open diff base for /tmp/utf16.txt
2023-04-01T21:43:04.412 helix_vcs [ERROR] Error {
    context: "failed to open git repo",
    source: Discover(
        NoGitRepositoryWithinFs {
            path: "/tmp",
            limit: "/",
        },
    ),
}
2023-04-01T21:43:04.412 helix_vcs [ERROR] failed to obtain current head name for /tmp/utf16.txt
2023-04-01T21:43:04.412 helix_view::editor [DEBUG] editor status: Loaded 1 file.
2023-04-01T21:43:04.415 helix_tui::backend::crossterm [DEBUG] The keyboard enhancement protocol is supported in this terminal (checked in 3.104306ms)
2023-04-01T21:43:04.415 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:04.812 helix_term::application [DEBUG] received editor event: IdleTimer
2023-04-01T21:43:04.903 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:05.062 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:05.215 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:05.367 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:05.767 helix_term::application [DEBUG] received editor event: IdleTimer
2023-04-01T21:43:05.975 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:06.150 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:06.376 helix_term::application [DEBUG] received editor event: IdleTimer
2023-04-01T21:43:06.382 helix_view::document [DEBUG] submitting save of doc 'Some("/tmp/utf16.txt")'
2023-04-01T21:43:06.382 helix_term::job [DEBUG] waiting on jobs...
2023-04-01T21:43:06.383 helix_view::document [DEBUG] doc 1 revision updated 0 -> 0
2023-04-01T21:43:06.383 helix_term::commands::typed [DEBUG] quitting...
2023-04-01T21:43:06.383 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0
2023-04-01T21:43:06.383 helix_term::job [DEBUG] waiting on jobs...
2023-04-01T21:43:06.383 helix_term::job [DEBUG] waiting on jobs...

Platform

Linux - Fedora 38

Terminal Emulator

kitty 0.26.5

Helix Version

helix 23.03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant