Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(commands): unescape yank-join separator #11012

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RoloEdits
Copy link
Contributor

@RoloEdits RoloEdits commented Jun 22, 2024

Introduces a str module and an unescape function to helix-stdx, which processes escape sequences in strings and converts them into their corresponding literal characters. The function handles a variety of escape sequences, including:

  • \n for newlines
  • \t for tabs
  • \u{...} for Unicode characters

The function does not unescape sequences like \\ to \, leaving them as they are. This opinionated behavior ensures that only certain escape sequences are processed, and is built around user input, not general input.

Given that its based around user input, a conservative approach was taken for handling bad input, where if the string cannot be processed as expected, it returns the original input.

Examples:

  • Converting escaped newlines: unescape("hello\\nworld") results in "hello\nworld".
  • Converting escaped tabs: unescape("hello\\tworld") results in "hello\tworld".
  • Converting Unicode escape sequences: unescape("hello\\u{1f929}world") results in "hello馃ぉworld".
  • Handling invalid Unicode escape sequence: unescape("hello\\u{999999999}world") results in the original "hello\\u{999999999}world".

The implementation also includes tests, but no guarantees for edge cases.

Previously, the yank-join command joined the current selection with the separator as-is. With this update, escape sequences in the separator are unescaped to their corresponding literal characters before joining the selection, aligning with user expectations:

Newline:
separator_unescape_newline

Tab:
separator_unescape_tab

Unicode:
separator_unescape_unicode

Closes: #10993

@RoloEdits RoloEdits force-pushed the yank-join-unescape branch 2 times, most recently from dd0d9b8 to b5ca706 Compare June 22, 2024 09:14
@RoloEdits
Copy link
Contributor Author

RoloEdits commented Jun 22, 2024

In testing the user input I noticed that ' and " input by themselves, :yank-join ", would result in newlines. This exists on master as well, and seems to stem from how Shellwords::from handles these inputs, which results in the TypedCommand args being empty, and defaulting to the line_ending as a result. I thought about trying to tackle that here, but the scope would be beyond the current PR, so going to open this for review.

@RoloEdits RoloEdits marked this pull request as ready for review June 22, 2024 09:40
@RoloEdits RoloEdits force-pushed the yank-join-unescape branch 7 times, most recently from 373a091 to d2de7dc Compare June 27, 2024 22:36
@kirawi kirawi added the A-command Area: Commands label Jun 29, 2024
This commit introduces a `str` module and an `unescape` function to
`helix-stdx`, which processes escape sequences in strings and converts
them into their corresponding literal characters. The function handles a
variety of escape sequences, including:

- `\n` for newlines
- `\t` for tabs
- `\u{...}` for Unicode characters

The function does not unescape sequences like `\\` to `\`, leaving them
as they are. This opinionated behavior ensures that only certain escape
sequences are processed, and is built around user input, not general
input.

Given that its based around user input, a conservative approach was
taken for handling bad input, where if the string cannot be processed as
expected, it returns the original input.

Examples:
- Converting escaped newlines: `unescape("hello\\nworld")` results in
`"hello\nworld"`.
- Converting escaped tabs: `unescape("hello\\tworld")` results in
`"hello\tworld"`.
- Converting Unicode escape sequences:
`unescape("hello\\u{1f929}world")` results in `"hello馃ぉworld"`.
- Handling invalid Unicode escape sequence:
`unescape("hello\\u{999999999}world")` results in the original
`"hello\\u{999999999}world"`.

The implementation also includes tests, but no gaurantees for edgecases.
This commit enhances the `yank-join` command by incorporating the
`unescape` function to process the separator provided by the user. This
improvement ensures that any escape sequences in the separator are correctly
interpreted, aligning with user expectations.

Previously, the `yank-join` command joined the current selection with the
separator as-is. With this update, escape sequences in the separator such as:
- `\\n` for newlines
- `\\t` for tabs
- `\\u{...}` for Unicode characters

are unescaped to their corresponding literal characters before joining the
selection.
@the-mikedavis
Copy link
Member

@pascalkuthe and I discussed this a bit and we're thinking that shellwords needs larger changes. Kakoune has different strategies for how it parses - raw kakoune and shell. One solution might be to leave parsing the line to the command itself. Commands that need arguments as they take now can parse the input into arguments and others like :yank-join and :sh can interpret the contents literally (Kakoune's "raw").

In any case I'm wary of adding a str extension in stdx. I believe this should be contained in the shellwords module instead

@RoloEdits
Copy link
Contributor Author

I would agree with the assessment. Id be willing to devote some time for these changes. After a quick look at current usage, we might be able to get away with storing the raw &str as input, and then on the call to words perform the current operations? There is a state on what was last left off, but this only seems to be used to checking ending whitespace? This could be replaced with a self.input.ends_with(' '). Then we can introduce a words_raw that would just perform the splitting of the command and the rest? If object encapsulation is more preferred then we could make a ShellwordsRaw that would have its own words function. And then just use this in the areas needed.

I've not had the pleasure of using Kakoune, so I'm not familiar with its specificities, but from what I gather, in helix, we would only need two states, not three? If you have any guidance as to what needs to be covered I can see what I can put together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-command Area: Commands
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Honor whitespace escape sequences in yank-join
3 participants