feat(commands): unescape yank-join separator #11012

RoloEdits · 2024-06-22T09:00:55Z

Introduces a str module and an unescape function to helix-stdx, which processes escape sequences in strings and converts them into their corresponding literal characters. The function handles a variety of escape sequences, including:

\n for newlines
\t for tabs
\u{...} for Unicode characters

The function does not unescape sequences like \\ to \, leaving them as they are. This opinionated behavior ensures that only certain escape sequences are processed, and is built around user input, not general input.

Given that its based around user input, a conservative approach was taken for handling bad input, where if the string cannot be processed as expected, it returns the original input.

Examples:

Converting escaped newlines: unescape("hello\\nworld") results in "hello\nworld".
Converting escaped tabs: unescape("hello\\tworld") results in "hello\tworld".
Converting Unicode escape sequences: unescape("hello\\u{1f929}world") results in "hello🤩world".
Handling invalid Unicode escape sequence: unescape("hello\\u{999999999}world") results in the original "hello\\u{999999999}world".

The implementation also includes tests, but no guarantees for edge cases.

Previously, the yank-join command joined the current selection with the separator as-is. With this update, escape sequences in the separator are unescaped to their corresponding literal characters before joining the selection, aligning with user expectations:

Newline:

Tab:

Unicode:

Closes: #10993

RoloEdits · 2024-06-22T09:40:43Z

In testing the user input I noticed that ' and " input by themselves, :yank-join ", would result in newlines. This exists on master as well, and seems to stem from how Shellwords::from handles these inputs, which results in the TypedCommand args being empty, and defaulting to the line_ending as a result. I thought about trying to tackle that here, but the scope would be beyond the current PR, so going to open this for review.

This commit introduces a `str` module and an `unescape` function to `helix-stdx`, which processes escape sequences in strings and converts them into their corresponding literal characters. The function handles a variety of escape sequences, including: - `\n` for newlines - `\t` for tabs - `\u{...}` for Unicode characters The function does not unescape sequences like `\\` to `\`, leaving them as they are. This opinionated behavior ensures that only certain escape sequences are processed, and is built around user input, not general input. Given that its based around user input, a conservative approach was taken for handling bad input, where if the string cannot be processed as expected, it returns the original input. Examples: - Converting escaped newlines: `unescape("hello\\nworld")` results in `"hello\nworld"`. - Converting escaped tabs: `unescape("hello\\tworld")` results in `"hello\tworld"`. - Converting Unicode escape sequences: `unescape("hello\\u{1f929}world")` results in `"hello🤩world"`. - Handling invalid Unicode escape sequence: `unescape("hello\\u{999999999}world")` results in the original `"hello\\u{999999999}world"`. The implementation also includes tests, but no gaurantees for edgecases.

This commit enhances the `yank-join` command by incorporating the `unescape` function to process the separator provided by the user. This improvement ensures that any escape sequences in the separator are correctly interpreted, aligning with user expectations. Previously, the `yank-join` command joined the current selection with the separator as-is. With this update, escape sequences in the separator such as: - `\\n` for newlines - `\\t` for tabs - `\\u{...}` for Unicode characters are unescaped to their corresponding literal characters before joining the selection.

the-mikedavis · 2024-07-06T15:22:15Z

@pascalkuthe and I discussed this a bit and we're thinking that shellwords needs larger changes. Kakoune has different strategies for how it parses - raw kakoune and shell. One solution might be to leave parsing the line to the command itself. Commands that need arguments as they take now can parse the input into arguments and others like :yank-join and :sh can interpret the contents literally (Kakoune's "raw").

In any case I'm wary of adding a str extension in stdx. I believe this should be contained in the shellwords module instead

RoloEdits · 2024-07-06T21:44:06Z

I would agree with the assessment. Id be willing to devote some time for these changes. After a quick look at current usage, we might be able to get away with storing the raw &str as input, and then on the call to words perform the current operations? There is a state on what was last left off, but this only seems to be used to checking ending whitespace? This could be replaced with a self.input.ends_with(' '). Then we can introduce a words_raw that would just perform the splitting of the command and the rest? If object encapsulation is more preferred then we could make a ShellwordsRaw that would have its own words function. And then just use this in the areas needed.

I've not had the pleasure of using Kakoune, so I'm not familiar with its specificities, but from what I gather, in helix, we would only need two states, not three? If you have any guidance as to what needs to be covered I can see what I can put together.

RoloEdits force-pushed the yank-join-unescape branch 2 times, most recently from dd0d9b8 to b5ca706 Compare June 22, 2024 09:14

RoloEdits marked this pull request as ready for review June 22, 2024 09:40

RoloEdits force-pushed the yank-join-unescape branch 7 times, most recently from 373a091 to d2de7dc Compare June 27, 2024 22:36

kirawi added the A-command Area: Commands label Jun 29, 2024

RoloEdits added 2 commits June 29, 2024 16:53

RoloEdits force-pushed the yank-join-unescape branch from d2de7dc to b58fe56 Compare June 29, 2024 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(commands): unescape yank-join separator #11012

feat(commands): unescape yank-join separator #11012

RoloEdits commented Jun 22, 2024 •

edited

Loading

RoloEdits commented Jun 22, 2024 •

edited

Loading

the-mikedavis commented Jul 6, 2024

RoloEdits commented Jul 6, 2024

feat(commands): unescape yank-join separator #11012

Are you sure you want to change the base?

feat(commands): unescape yank-join separator #11012

Conversation

RoloEdits commented Jun 22, 2024 • edited Loading

RoloEdits commented Jun 22, 2024 • edited Loading

the-mikedavis commented Jul 6, 2024

RoloEdits commented Jul 6, 2024

RoloEdits commented Jun 22, 2024 •

edited

Loading

RoloEdits commented Jun 22, 2024 •

edited

Loading