Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deno.consoleSize + full-width text/ansi escape codes is bug-prone with no official way of mitigating #18477

Closed
lionel-rowe opened this issue Mar 28, 2023 · 4 comments · Fixed by #18508

Comments

@lionel-rowe
Copy link
Contributor

Even in "monospace" text, many characters take up double the space, and many take up no space at all. Comparing Deno.consoleSize().columns to string length, or even using more accurate Unicode-aware forms of measuring, such as [...str].length or [...new Intl.Segmenter('en', { granularity: 'grapheme' }).segment(str)].length, is highly bug-prone:

'人'.length // 1, but takes up 2 character width
'\x1b[36mcyan\x1b[1m'.length // 13, but only 4 printable characters
'á'.normalize('NFD').length // 2, but only 1 grapheme
'🦄'.length // 2, which is "correct" but for the wrong reason (due to UTF-16 code units)
'\u200c'.length // 1, should be 0 because it's a zero-width character

In order for Deno.consoleSize to be reliable* for general-purpose use, some method of getting the width of a string also needs to be exposed. Possible candidates:

  1. ext/console/02_console.js > getStringWidth
  2. ext/node/polyfills/internal/util/inspect.mjs > getStringWidth, which is a thin wrapper around 1
  3. The https://github.com/unicode-rs/unicode-width crate (which is a dependency of rustyline)

Of these, unicode-width looks to be by far the most comprehensive.

* Even once this is implemented there will be edge cases, such as differences in fonts, codepoints that don't have a fixed width, etc. but it should at least be reliable for the majority of real-world use cases.

@dsherret
Copy link
Member

I don't think this needs to be done in the runtime. Couldn't this be done in some userland javascript module?

@lionel-rowe
Copy link
Contributor Author

lionel-rowe commented Mar 29, 2023

I don't think this needs to be done in the runtime. Couldn't this be done in some userland javascript module?

It's true it's not a clear-cut case, but here are the best arguments I can think of for why it should be in the runtime:

  1. Deno.consoleSize is already provided by the runtime, and it seems weird to expose a method that invites writing buggy code without also exposing a way to mitigate those bugs.
  2. It also seems weird to expose a method that works perfectly with printable ASCII and some unknown subset of all other Unicode characters. A Russian developer could write code that works perfectly with Cyrillic and Latin text, only for it to completely break when it hits user-supplied Korean text.
  3. Relying on userland code to fix 1 and 2 will result in an unlimited number of implementations, all of which work slightly differently. It doesn't make sense having multiple different ways of measuring text.
  4. Deno already uses multiple implementations internally (e.g. repl vs console.table) — it'd make sense to unify these too, e.g. adding an op_string_width that's guaranteed to use the same metric as the repl. Once unified, it'd further make sense (IMO) to expose a thin wrapper around this to userland code.

If this isn't added to the runtime, adding a note to the JSDoc for cli/tsc/dts/lib.deno.ns.d.ts > consoleSize could at least alert developers to the footguns, without directly mitigating them.

Edit: As another possibility, adding a TS port of the unicode-width crate to deno_std could provide a partial solution without adding anything to the runtime.

@dsherret
Copy link
Member

... it seems weird to expose a method that invites writing buggy code without also exposing a way to mitigate those bugs ... It also seems weird to expose a method that works perfectly with printable ASCII and some unknown subset of all other Unicode characters

It's exposing the underlying size that the console reports via the winapi on windows and libc elsewhere.

deno/runtime/ops/tty.rs

Lines 245 to 287 in e0429e2

pub fn console_size(
std_file: &std::fs::File,
) -> Result<ConsoleSize, std::io::Error> {
#[cfg(windows)]
{
use std::os::windows::io::AsRawHandle;
let handle = std_file.as_raw_handle();
// SAFETY: winapi calls
unsafe {
let mut bufinfo: winapi::um::wincon::CONSOLE_SCREEN_BUFFER_INFO =
std::mem::zeroed();
if winapi::um::wincon::GetConsoleScreenBufferInfo(handle, &mut bufinfo)
== 0
{
return Err(Error::last_os_error());
}
Ok(ConsoleSize {
cols: bufinfo.dwSize.X as u32,
rows: bufinfo.dwSize.Y as u32,
})
}
}
#[cfg(unix)]
{
use std::os::unix::io::AsRawFd;
let fd = std_file.as_raw_fd();
// SAFETY: libc calls
unsafe {
let mut size: libc::winsize = std::mem::zeroed();
if libc::ioctl(fd, libc::TIOCGWINSZ, &mut size as *mut _) != 0 {
return Err(Error::last_os_error());
}
Ok(ConsoleSize {
cols: size.ws_col as u32,
rows: size.ws_row as u32,
})
}
}
}

This is standard API that's found in many other languages.

Relying on userland code to fix 1 and 2 will result in an unlimited number of implementations, all of which work slightly differently. It doesn't make sense having multiple different ways of measuring text.

Does this mean that someone would have the choice to select which implementation they would want to use? Is there a reason someone would use one implementation over another?

Edit: As another possibility, adding a TS port of the unicode-width crate to deno_std could provide a partial solution without adding anything to the runtime.

I think this would be better to live in deno_std or somewhere else. The console window size can be retreived via the Deno.consoleSize api, the string width retrieved via a module found in js, then by using the two someone can figure out how much of a string fits on a line.

If this isn't added to the runtime, adding a note to the JSDoc for cli/tsc/dts/lib.deno.ns.d.ts > consoleSize could at least alert developers to the footguns, without directly mitigating them.

I opened #18508

@lionel-rowe
Copy link
Contributor Author

Does this mean that someone would have the choice to select which implementation they would want to use? Is there a reason someone would use one implementation over another?

Not that I can think of — it's generally a case of one implementation being strictly better or worse than other implementations (modulo font differences etc), which is why IMO having one "canonical" implementation makes sense.

I think this would be better to live in deno_std or somewhere else. The console window size can be retreived via the Deno.consoleSize api, the string width retrieved via a module found in js, then by using the two someone can figure out how much of a string fits on a line.

I made a port of the unicode_width crate, will probably submit as a PR to deno_std at some point, or alternatively release as a standalone module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants