Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorten CWD path #854

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Shorten CWD path #854

wants to merge 1 commit into from

Conversation

onlined
Copy link

@onlined onlined commented Oct 21, 2021

For showing the CWD path in a more compact way, this function

  • shortens path parts before last from right to left, until first
    character of the path stays (directory -> direc~ or d~)
  • combines parts before last from right with left (a~/b~ -> a~~)
  • shortens the last part from middle, preferring to keep characters
    from left (longdirectoryname -> long~ame)

in order (of 2nd and 3rd, the one that removes more characters is applied first) until given maximum length is reached.

Closes #462.

Copy link
Member

@BenBE BenBE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. There's still some problems with it, that should be addressed. One major one is the missing support for multibyte charsets like UTF-8 and as we've seen even usernames with kanji it's essential that paths containing such characters are handled properly, as paths with special characters are even more common than usernames with such arcane characters.

When updating your code feel free to squash and rebase your branch freely to keep a clean commit history.

Process.c Outdated Show resolved Hide resolved
Process.c Outdated Show resolved Hide resolved
Process.c Outdated
strcpy(ctx->cwd + writeIndex, ctx->parts[i]);
writeIndex += ctx->partLengths[i];
if (i < ctx->partsLen - 1) {
strcpy(ctx->cwd + writeIndex, "/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use string functions from XUtils.h for copying things around.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied character directly instead.

Process.c Outdated Show resolved Hide resolved
Process.c Outdated Show resolved Hide resolved
Process.c Outdated Show resolved Hide resolved
Process.c Outdated
size_t i;
for (i = ctx->partsLen - 2; i > 1; i--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like re-using a loop variable outside the initial loop scope it was worked with …

Copy link
Author

@onlined onlined Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other way would be adding an extra variable and update it with i's value every time, or maybe converting loop condition into an if at the and and assign in the block just before break, but these are both verbose and unnecessary IMO. Since the function body is fairly small and i is relevant until the end, I don't think that it causes any risk.

Process.c Outdated Show resolved Hide resolved
Process.c Outdated Show resolved Hide resolved
Process.c Outdated Show resolved Hide resolved
@onlined onlined force-pushed the shorten-cwd-path branch 7 times, most recently from 253b91a to b08dbcf Compare October 21, 2021 22:35
Process.c Outdated
return diff;
}

static size_t shortenCwdLastPart(shortenCwdContext *ctx, bool doActualWork) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I don't like a boolean data type in parameters, what does doActualWork do and what's the difference between when I set it to true and set to false?

Copy link
Author

@onlined onlined Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If doActualWork is true, the function does the actual work but if it's false, only the number of characters to be removed is calculated. Adding comments could help, like you mentioned before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you can split the length calculation and the "actual work" part into two routines, or redesign the parameter so it makes more sense (e.g. copy and work with the path in a new buffer instead of doing that in-place). The boolean parameter introduces control coupling that may be unnecessary.

Process.c Outdated
Comment on lines 848 to 855
shortenCwdParts(&ctx);
if (shortenCwdLastPart(&ctx, false) > collapseCwdParts(&ctx, false)) {
shortenCwdLastPart(&ctx, true);
collapseCwdParts(&ctx, true);
} else {
collapseCwdParts(&ctx, true);
shortenCwdLastPart(&ctx, true);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should document the path shortening behavior somewhere in the code comments. It isn't obvious on how the path would be shortened as your algorithm looks not trivial enough.
It would be better to also put in examples, to help people test the behaviors.

@BenBE BenBE added this to the 3.2.0 milestone Oct 22, 2021
@onlined
Copy link
Author

onlined commented Oct 22, 2021

@BenBE I see that you indicated that the code must be multibyte aware. I totally agree with you and I started working on it. I have one question: Is it OK to use standard library multibyte functions (mblen, mbsrtowcs etc.) or do I need another solution for it?

Process.c Outdated Show resolved Hide resolved
@BenBE
Copy link
Member

BenBE commented Oct 22, 2021

@BenBE I see that you indicated that the code must be multibyte aware. I totally agree with you and I started working on it. I have one question: Is it OK to use standard library multibyte functions (mblen, mbsrtowcs etc.) or do I need another solution for it?

Using standard library is totally fine and much preferred over including additional libraries we do not yet link with.

For showing the CWD path in a more compact way, this function

    - shortens path parts before last from right to left, until first
      character of the path stays (directory -> direc~ or d~)
    - combines parts before last from right with left (a~/b~ -> a~~)
    - shortens the last part from middle, preferring to keep characters
      from left (longdirectoryname -> long~ame)

in order (of 2nd and 3rd, the one that removes more characters is
applied first) until given maximum length is reached.
@Explorer09
Copy link
Contributor

I finally have a chance to test your code, and it might be late to inform you that there's one thing you need to address.
htop CWD with CJK characters screenshot

People who didn't live in East Asia wouldn't know, that not all characters occupy the same width in the terminal. Specifically,

  • CJK characters may occupy twice the width of the Latin letters, and
  • Combining characters occupy no width, but modifies the character before it.

The above screenshot shows CJK characters (Chinese numerals '一二三四五六'), I've also tested with some Zalgo text and it displays weirdly, too.

@onlined
Copy link
Author

onlined commented Oct 26, 2021

@Explorer09 @BenBE I tried to find a solution for finding out how many columns a character occupies. I saw wcwidth and wcswidth but their behavior changes according to locale, so it's not much sane to use them. Do you know any portable (POSIX is enough probably) and reliable way for finding how much space a wide character occupies?

@Explorer09
Copy link
Contributor

@online I already expected you would get stuck with the problem when I mentioned it. It troubled me too (when I worked in a project of my company), and I have a conservative approach:

  1. Set terminal cursor position to where the character would be printed (make sure it's safe to print junk characters there).
  2. Print the character (one at a time).
  3. Query cursor position to determine the width.
  4. Repeat step 2.

Yes, the safest way is to just print it out. And you are right that the width is locale-dependent. If you have read the UAX #11 East Asian Width, there is an "East Asian Ambiguous" category of characters, whose rendering widths are determined by the terminal's default character encoding (full-width when a CJK encoding is selected, and half-width when a non-CJK encoding is selected). I don't think the locale data is more reliable than the terminal setting when it comes to rendering, and that's why I suggest the above steps instead.

@onlined
Copy link
Author

onlined commented Oct 30, 2021

@Explorer09 But I need to preprocess the string and decide which characters to keep or remove, before actually printing the characters. Do you suggest printing to a non-visible or dummy terminal?

@Explorer09
Copy link
Contributor

@onlined My idea is to use the space of the CWD table cell -- you print the characters just as you process it. It's okay to overprint the space multiple time until you finally shorten the path.

@BenBE
Copy link
Member

BenBE commented Oct 30, 2021

@onlined My idea is to use the space of the CWD table cell -- you print the characters just as you process it. It's okay to overprint the space multiple time until you finally shorten the path.

Problem is, this cell is not actually available for reference in that routine. The function should be independent of which screen position it is later on printed too. In particular, the printed string is only rendered into a RichString later on and that is finally written onto the display. So shortening to 25 code points with wcwidth/wcswidth is fully acceptable IMHO (even if this may differ somewhat from what the terminal will finally end up actually writing). Cf. mbswidth(3)

@Explorer09
Copy link
Contributor

@BenBE That's a structural problem. It would be better to fix the routine to accept a terminal position as the parameter. You don't need to prepare the string until it's printed at least once. So it's not unreasonable to request a terminal position for the routine.

@BenBE
Copy link
Member

BenBE commented Oct 30, 2021

@BenBE That's a structural problem. It would be better to fix the routine to accept a terminal position as the parameter. You don't need to prepare the string until it's printed at least once. So it's not unreasonable to request a terminal position for the routine.

Yes and no.

Yes, we should delay the preparation of this string to when we first actually write it to the display. But that's not how this works internally in htop right now and would take a bit of structural work to change it that way. That's not part of this PR and if we go that route, then we'd need to do quite a bit of related work elsewhere too to keep things consistent.

And no, you shouldn't print temporary things to a terminal. True story: I had a text-based app once that had a progress bar for showing the load progress when it was initializing different things. So far, so normal. One of the users had a braille display connected to the computer and sent in a complaint about the braille display going brrrrrrrrrr whenever the program started. Long story short: DON'T write temporary or fast-changing information to a terminal.

@Explorer09
Copy link
Contributor

@BenBE But htop is ncursed-based! If htop prints everything sequentially (to maintain compatibility with line terminals), I won't suggest this approach.
I know what you mean when you say progress bar as an example: wget is the example of having two ways to output the progress. One prints everything sequentially (incuding the progress indicator), the other uses overprinting to indicate the progress (and it prints the filename being downloaded in a scrolling text).

@BenBE
Copy link
Member

BenBE commented Oct 31, 2021

@BenBE But htop is ncursed-based! If htop prints everything sequentially (to maintain compatibility with line terminals), I won't suggest this approach. I know what you mean when you say progress bar as an example: wget is the example of having two ways to output the progress. One prints everything sequentially (incuding the progress indicator), the other uses overprinting to indicate the progress (and it prints the filename being downloaded in a scrolling text).

Both methods cause the mal-behavior I described. The problem is not the sequential printing, but the printing itself.

@fasterit fasterit modified the milestones: 3.2.0, 3.3.0 Feb 24, 2022
@BenBE BenBE marked this pull request as draft April 18, 2022 21:13
@BenBE BenBE modified the milestones: 3.3.0, 3.4.0 Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shorten path for CWD column when displaying
4 participants