We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When reading from UTF16 file, helix decode it correctly, but when writing back the file on disk, it write back garbage data.
wget https://raw.githubusercontent.com/stain/encoding-test-files/master/utf16.txt
file utf16.txt
utf16.txt: Unicode text, UTF-16, little-endian text
hx
:x
utf16.txt: ISO-8859 text
helix to convert the file to UTF8 - or keep UTF16 little endian
helix written something else completely - ASCII character survive the change of encoding but any character outside ASCII is not Unicode characters.
00000000 ff fe 70 00 72 00 65 00 6d 00 69 00 e8 00 72 00 |..p.r.e.m.i...r.| 00000010 65 00 20 00 69 00 73 00 20 00 66 00 69 00 72 00 |e. .i.s. .f.i.r.| 00000020 73 00 74 00 0a 00 70 00 72 00 65 00 6d 00 69 00 |s.t...p.r.e.m.i.| 00000030 65 00 00 03 72 00 65 00 20 00 69 00 73 00 20 00 |e...r.e. .i.s. .| 00000040 73 00 6c 00 69 00 67 00 68 00 74 00 6c 00 79 00 |s.l.i.g.h.t.l.y.| 00000050 20 00 64 00 69 00 66 00 66 00 65 00 72 00 65 00 | .d.i.f.f.e.r.e.| 00000060 6e 00 74 00 0a 00 1a 04 38 04 40 04 38 04 3b 04 |[email protected].;.| 00000070 3b 04 38 04 46 04 30 04 20 00 69 00 73 00 20 00 |;.8.F.0. .i.s. .| 00000080 43 00 79 00 72 00 69 00 6c 00 6c 00 69 00 63 00 |C.y.r.i.l.l.i.c.| 00000090 0a 00 01 d8 00 dc 20 00 61 00 6d 00 20 00 44 00 |...... .a.m. .D.| 000000a0 65 00 73 00 65 00 72 00 65 00 74 00 0a 00 |e.s.e.r.e.t...| 000000ae
00000000 70 72 65 6d 69 e8 72 65 20 69 73 20 66 69 72 73 |premi.re is firs| 00000010 74 0a 70 72 65 6d 69 65 26 23 37 36 38 3b 72 65 |t.première| 00000020 20 69 73 20 73 6c 69 67 68 74 6c 79 20 64 69 66 | is slightly dif| 00000030 66 65 72 65 6e 74 0a 26 23 31 30 35 30 3b 26 23 |ferent.К&#| 00000040 31 30 38 30 3b 26 23 31 30 38 38 3b 26 23 31 30 |1080;р
| 00000050 38 30 3b 26 23 31 30 38 33 3b 26 23 31 30 38 33 |80;лл| 00000060 3b 26 23 31 30 38 30 3b 26 23 31 30 39 34 3b 26 |;иц&| 00000070 23 31 30 37 32 3b 20 69 73 20 43 79 72 69 6c 6c |#1072; is Cyrill| 00000080 69 63 0a 26 23 36 36 35 36 30 3b 20 61 6d 20 44 |ic.𐐀 am D| 00000090 65 73 65 72 65 74 0a |eseret.| 00000097
2023-04-01T21:43:04.411 helix_view::clipboard [DEBUG] Using wl-copy+wl-paste to interact with the system and selection (primary) clipboard 2023-04-01T21:43:04.412 helix_vcs [ERROR] Error { context: "failed to open git repo", source: Discover( NoGitRepositoryWithinFs { path: "/tmp", limit: "/", }, ), } 2023-04-01T21:43:04.412 helix_vcs [ERROR] failed to open diff base for /tmp/utf16.txt 2023-04-01T21:43:04.412 helix_vcs [ERROR] Error { context: "failed to open git repo", source: Discover( NoGitRepositoryWithinFs { path: "/tmp", limit: "/", }, ), } 2023-04-01T21:43:04.412 helix_vcs [ERROR] failed to obtain current head name for /tmp/utf16.txt 2023-04-01T21:43:04.412 helix_view::editor [DEBUG] editor status: Loaded 1 file. 2023-04-01T21:43:04.415 helix_tui::backend::crossterm [DEBUG] The keyboard enhancement protocol is supported in this terminal (checked in 3.104306ms) 2023-04-01T21:43:04.415 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:04.812 helix_term::application [DEBUG] received editor event: IdleTimer 2023-04-01T21:43:04.903 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:05.062 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:05.215 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:05.367 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:05.767 helix_term::application [DEBUG] received editor event: IdleTimer 2023-04-01T21:43:05.975 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:06.150 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:06.376 helix_term::application [DEBUG] received editor event: IdleTimer 2023-04-01T21:43:06.382 helix_view::document [DEBUG] submitting save of doc 'Some("/tmp/utf16.txt")' 2023-04-01T21:43:06.382 helix_term::job [DEBUG] waiting on jobs... 2023-04-01T21:43:06.383 helix_view::document [DEBUG] doc 1 revision updated 0 -> 0 2023-04-01T21:43:06.383 helix_term::commands::typed [DEBUG] quitting... 2023-04-01T21:43:06.383 helix_view::document [DEBUG] id 1 modified - last saved: 0, current: 0 2023-04-01T21:43:06.383 helix_term::job [DEBUG] waiting on jobs... 2023-04-01T21:43:06.383 helix_term::job [DEBUG] waiting on jobs...
Linux - Fedora 38
kitty 0.26.5
helix 23.03
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
Summary
When reading from UTF16 file, helix decode it correctly, but when writing back the file on disk, it write back garbage data.
Reproduction Steps
I tried this:
wget https://raw.githubusercontent.com/stain/encoding-test-files/master/utf16.txt
- download a sample UTF16 file inside this projectfile utf16.txt
- printutf16.txt: Unicode text, UTF-16, little-endian text
hx
:x
file utf16.txt
- now printutf16.txt: ISO-8859 text
I expected this to happen:
helix to convert the file to UTF8 - or keep UTF16 little endian
Instead, this happened:
helix written something else completely - ASCII character survive the change of encoding but any character outside ASCII is not Unicode characters.
File content
hexdump before file edit
hexdump after file edit
Helix log
~/.cache/helix/helix.log
Platform
Linux - Fedora 38
Terminal Emulator
kitty 0.26.5
Helix Version
helix 23.03
The text was updated successfully, but these errors were encountered: