Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhauling Human-Readable Sizes #224

Open
mfwitten opened this issue Oct 8, 2020 · 33 comments
Open

Overhauling Human-Readable Sizes #224

mfwitten opened this issue Oct 8, 2020 · 33 comments
Labels
needs-discussion 🤔 Changes need to be discussed and require consent

Comments

@mfwitten
Copy link
Contributor

mfwitten commented Oct 8, 2020

I've got some code in the works to improve Process_humanNumber():

  • There is only integer arithmetic.
  • The type unsigned long long is used only as necessary.
  • The colors and numbers are hopefully more intuitive.
  • The field is used in a way that maximizes precision.

I intend to clean up the code in the coming days, and then I will submit a PR. In the mean time, here is the output from a test program; let me know what you think:

       COLOR         FIELD            KiBs                                         Units                             
------------------ -------- ----------------------- -----------------------------------------------------------------
          PROCESS: |    0 | K(                   0) = K(0)
          PROCESS: | 1023 | K(                1023) = K(1023)
PROCESS_MEGABYTES: | 1024 | K(                1024) = K(1024)
PROCESS_MEGABYTES: |99999 | K(               99999) = K(99999)
PROCESS_MEGABYTES: |97.7M | K(              100000) = K(100000)
PROCESS_MEGABYTES: |99.9M | K(              102348) = M(99) + K(972)
PROCESS_MEGABYTES: | 100M | K(              102349) = M(99) + K(973)
PROCESS_MEGABYTES: |1023M | K(             1048063) = M(1023) + K(511)
PROCESS_MEGABYTES: |1024M | K(             1048064) = M(1023) + K(512)
PROCESS_GIGABYTES: |1024M | K(             1048576) = M(1024)
PROCESS_GIGABYTES: |9999M | K(            10239487) = M(9999) + K(511)
PROCESS_GIGABYTES: |9.77G | K(            10239488) = M(9999) + K(512)
PROCESS_GIGABYTES: |9.77G | K(            10240000) = M(10000)
PROCESS_GIGABYTES: |9.99G | K(            10480517) = G(9) + M(1018) + K(901)
PROCESS_GIGABYTES: |10.0G | K(            10480518) = G(9) + M(1018) + K(902)
PROCESS_GIGABYTES: |10.0G | K(            10485760) = G(10)
PROCESS_GIGABYTES: |99.9G | K(           104805171) = G(99) + M(972) + K(819)
PROCESS_GIGABYTES: | 100G | K(           104805172) = G(99) + M(972) + K(820)
PROCESS_GIGABYTES: | 100G | K(           104857600) = G(100)
PROCESS_GIGABYTES: |1023G | K(          1073217535) = G(1023) + M(511) + K(1023)
PROCESS_GIGABYTES: |1024G | K(          1073217536) = G(1023) + M(512)
     LARGE_NUMBER: |1024G | K(          1073741824) = G(1024)
     LARGE_NUMBER: |4095G | K(          4294443007) = G(4095) + M(511) + K(1023)
     LARGE_NUMBER: |4096G | K(          4294443008) = G(4095) + M(512)
     LARGE_NUMBER: |4096G | K(          4294967296) = G(4096)
     LARGE_NUMBER: |9999G | K(         10485235711) = G(9999) + M(511) + K(1023)
     LARGE_NUMBER: |9.77T | K(         10485235712) = G(9999) + M(512)
     LARGE_NUMBER: |9.77T | K(         10485760000) = G(10000)
     LARGE_NUMBER: |9.99T | K(         10732049530) = T(9) + G(1018) + M(901) + K(122)
     LARGE_NUMBER: |10.0T | K(         10732049531) = T(9) + G(1018) + M(901) + K(123)
     LARGE_NUMBER: |10.0T | K(         10737418240) = T(10)
     LARGE_NUMBER: |99.9T | K(        107320495308) = T(99) + G(972) + M(819) + K(204)
     LARGE_NUMBER: | 100T | K(        107320495309) = T(99) + G(972) + M(819) + K(205)
     LARGE_NUMBER: | 100T | K(        107374182400) = T(100)
     LARGE_NUMBER: |1023T | K(       1098974756863) = T(1023) + G(511) + M(1023) + K(1023)
     LARGE_NUMBER: |1024T | K(       1098974756864) = T(1023) + G(512)
     LARGE_NUMBER: |1024T | K(       1099511627776) = T(1024)
     LARGE_NUMBER: |9999T | K(      10736881369087) = T(9999) + G(511) + M(1023) + K(1023)
     LARGE_NUMBER: |9.77E | K(      10736881369088) = T(9999) + G(512)
     LARGE_NUMBER: |9.77E | K(      10737418240000) = T(10000)
     LARGE_NUMBER: |9.99E | K(      10989618719621) = E(9) + T(1018) + G(901) + M(122) + K(901)
     LARGE_NUMBER: |10.0E | K(      10989618719622) = E(9) + T(1018) + G(901) + M(122) + K(902)
     LARGE_NUMBER: |10.0E | K(      10995116277760) = E(10)
     LARGE_NUMBER: |99.9E | K(     109896187196211) = E(99) + T(972) + G(819) + M(204) + K(819)
     LARGE_NUMBER: | 100E | K(     109896187196212) = E(99) + T(972) + G(819) + M(204) + K(820)
     LARGE_NUMBER: | 100E | K(     109951162777600) = E(100)
     LARGE_NUMBER: |9999E | K(   10994566521946111) = E(9999) + T(511) + G(1023) + M(1023) + K(1023)
     LARGE_NUMBER: |9.77Z | K(   10994566521946112) = E(9999) + T(512)
     LARGE_NUMBER: |9.77Z | K(   10995116277760000) = E(10000)
     LARGE_NUMBER: |9.99Z | K(   11253369568892026) = Z(9) + E(1018) + T(901) + G(122) + M(901) + K(122)
     LARGE_NUMBER: |10.0Z | K(   11253369568892027) = Z(9) + E(1018) + T(901) + G(122) + M(901) + K(123)
     LARGE_NUMBER: |10.0Z | K(   11258999068426240) = Z(10)
     LARGE_NUMBER: |99.9Z | K(  112533695688920268) = Z(99) + E(972) + T(819) + G(204) + M(819) + K(204)
     LARGE_NUMBER: | 100Z | K(  112533695688920269) = Z(99) + E(972) + T(819) + G(204) + M(819) + K(205)
     LARGE_NUMBER: | 100Z | K(  112589990684262400) = Z(100)
     LARGE_NUMBER: |9999Z | K(11258436118472818687) = Z(9999) + E(511) + T(1023) + G(1023) + M(1023) + K(1023)
     LARGE_NUMBER: |9.77Y | K(11258436118472818688) = Z(9999) + E(512)
     LARGE_NUMBER: |9.77Y | K(11258999068426240000) = Z(10000)
     LARGE_NUMBER: |9.99Y | K(11523450438545435525) = Y(9) + Z(1018) + E(901) + T(122) + G(901) + M(122) + K(901)
     LARGE_NUMBER: |9.99Y | K(11523450438545435583) = Y(9) + Z(1018) + E(901) + T(122) + G(901) + M(122) + K(959)
     LARGE_NUMBER: |10.0Y | K(11523450438545435584) = Y(9) + Z(1018) + E(901) + T(122) + G(901) + M(122) + K(960)
     LARGE_NUMBER: |10.0Y | K(11529215046068469760) = Y(10)
     LARGE_NUMBER: |15.9Y | K(18389097998479209267) = Y(15) + Z(972) + E(819) + T(204) + G(819) + M(204) + K(819)
     LARGE_NUMBER: |15.9Y | K(18389097998479209279) = Y(15) + Z(972) + E(819) + T(204) + G(819) + M(204) + K(831)
     LARGE_NUMBER: |16.0Y | K(18389097998479209280) = Y(15) + Z(972) + E(819) + T(204) + G(819) + M(204) + K(832)
     LARGE_NUMBER: |16.0Y | K(18446744073709551615) = Y(15) + Z(1023) + E(1023) + T(1023) + G(1023) + M(1023) + K(1023)
          PROCESS: |    0 | K(                   0) = Y(16)
@BenBE
Copy link
Member

BenBE commented Oct 8, 2020

Are you keeping the two-shaded coloring for values in different scales? I.e. for 99.7M keep the 99 as M_COLOR whereas the .7 is K_COLOR ?

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 8, 2020

It does not use two-shaded coloring.

The color represents the largest whole binary suffix that applies, which is mildly independent of the actual number that is displayed.

  • M(1023) + K(512) is of course 1023.5M, which is rounded to 1024M for the purposes of display; the rounded value looks like 1G (because that is the closest representable value), but the color more precisely remains just PROCESS_MEGABYTES.

  • As soon as the size is actually at least 1G, the color changes to relfect this reality; when the size is M(1024), it is also displayed as 1024M, and the color becomes PROCESS_GIGABYTES.

In this way, the user can see things at a glance:

  • 1024M with color PROCESS_MEGABYTES means the size is just under 1G.

  • 1024M with color PROCESS_GIGABYTES means the size is at least 1G (or bigger).

The benefits are that the numbers are much more human-readable, and the colors indicate a bit of the missing precision.

@BenBE
Copy link
Member

BenBE commented Oct 8, 2020

I don't think the user will get the difference between 1024 being displayed in MB_COLOR or GB_COLOR without it being explained. While the (existing) two-color display doesn't need much explanation it also has the visual bonus of making the numbers easier to parse at a glance. When I recently changed the display of the size values I extended this visual clue from the KB/MB precision to MB/GB because having this clue makes parsing out the GB part much easier.

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 8, 2020

Here is a comparison of the text between the 2 display formats we are currently comparing:

Input Two-Color This New Way
M(1023) + K(1023) 1023M 1024M

Here is the coloring:

Two-Color This New Way
1023M 1024M
PROCESS_GIGABYTES(1)PROCESS_MEGABYTES(023M) PROCESS_MEGABYTES(1024M)

I'm not even sure what the existing two-color system is trying to say there.

Here is another example:

Input Two-Color This New Way
G(9) + M(1018) + K(901) 10.2G 9.99G
Two-Color This New Way
10.2G 9.99G
PROCESS_GIGABYTES(10)PROCESS_MEGABYTES(.2G) PROCESS_GIGABYTES(9.99G)

It's clear that the existing two-color display is using a mishmash of base 10 and base 1024 units, which is fairly unintelligible if not incorrect.

The existing code also fails at larger numbers, becoming too wide for the field:

Input Two-Color This New Way Comment
T(100) 100.0T 100T %4.1fT doesn't constrain the integers.
T(100) + G(768) 100.8T 101T
E(100) 102400.0T 100E
Z(9) + E(783) 10238976. 9999E The limited buffer starts cutting off text.
Z(9) + E(784) 10240000. 9.77Z
Z(10) 10485760. 10.0Z
Z(10) + E(103) 10591232. 10.1Z
Z(100) 104857600 100Z
Y(15) + Z(768) 169114337 15.8Y

I would suggest the format I've presented should be the default, intended for casual users.

Maybe there could be other schemes for more advanced users.

  • For instance, color could be used in lieu of a decimal point; a user could forego a decimal point in order to gain an extra digit of precision:

    Default No Decimal Point
    10.5M 1048M
    PROCESS_MEGABYTES(10.5M) PROCESS_MEGABYTES(10)PROCESS(48)PROCESS_MEGABYTES(M)
  • The color could also be used to in place of the unit as well:

    Decimal Point. No Unit No Decimal Point. No Unit
    10.48 10477
    PROCESS_MEGABYTES(10.48) PROCESS_MEGABYTES(10)PROCESS(477)

@BenBE
Copy link
Member

BenBE commented Oct 8, 2020

Looking through the issue tracker I found another issue regarding the display format, that could become a good middle ground between the new scheme and the two-color system. Have a look at the format suggested in #35 (examples in hishamhm/htop#901).

The numbers could be used as shown in your demonstration (no objections on removing the base 10/2 mix). Coloring could then become 13G37 for 13.37 GiB, marking the 13G part as gigabytes, and the fraction part (37) as megabytes. That's also basically what the intent of the current scheme ist.

Reason: Avoiding large numbers with many equally colored digits to provide some visual clues for fast parsing.

FWIW: The letter for the unit should be kept for UI reasons (e.g. on monochrome displays) and for visually impaired users. Conveying information purely through color is a bad accessibility option.

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 9, 2020

That sounds good. I'll change the code I'm working on to use that notation.

@eworm-de
Copy link
Contributor

I am not quite sure what this addresses. I am fine with changing the columns in process list.
But this will not affect the meters, no? I would like to keep the current code and display for memory and swap meter.

@BenBE
Copy link
Member

BenBE commented Oct 13, 2020

The change affects the display of the various columns like virtual size. For Meters a different function is used. Thus no changes there.

@eworm-de
Copy link
Contributor

Yes, I know a different function is used for meters... It was me who implemented the human readable sizes there. ;)
But as no code is available yet I could not be sure.

So all for this!

@Explorer09
Copy link
Contributor

Explorer09 commented Oct 14, 2020

Hi. May I make some comments on this?

 COLOR         FIELD            KiBs                                         Units                             
------------------ -------- ----------------------- -----------------------------------------------------------------
          PROCESS: | 1023 | K(                1023) = K(1023)
PROCESS_MEGABYTES: | 1024 | K(                1024) = K(1024)
PROCESS_MEGABYTES: |99999 | K(               99999) = K(99999)
PROCESS_MEGABYTES: |97.7M | K(              100000) = K(100000)
PROCESS_MEGABYTES: |99.9M | K(              102348) = M(99) + K(972)
PROCESS_MEGABYTES: | 100M | K(              102349) = M(99) + K(973)
PROCESS_MEGABYTES: |1023M | K(             1048063) = M(1023) + K(511)
PROCESS_MEGABYTES: |1024M | K(             1048064) = M(1023) + K(512)
PROCESS_GIGABYTES: |1024M | K(             1048576) = M(1024)
PROCESS_GIGABYTES: |9999M | K(            10239487) = M(9999) + K(511)
PROCESS_GIGABYTES: |9.77G | K(            10239488) = M(9999) + K(512)
PROCESS_GIGABYTES: |9.77G | K(            10240000) = M(10000)
PROCESS_GIGABYTES: |9.99G | K(            10480517) = G(9) + M(1018) + K(901)
PROCESS_GIGABYTES: |10.0G | K(            10480518) = G(9) + M(1018) + K(902)
PROCESS_GIGABYTES: |10.0G | K(            10485760) = G(10)

I'm really not sure if displaying "97.7M" and "99.9M" in the above example is a good idea. For me it looks like an over-precision and increased confusion, since many people are not yet aware of the base-1000 and base-1024 differences.

Of course the memory usage scale should be base-1024, but I have three suggestions here:

  • Cut off the fractions for megabytes (actually mebibytes) display, so no "97.7M" display which can be confused with "9770" (KiB),
  • For gigabytes and larger units, go with an "engineering notation" with the significand in the range [0.98, 999] instead of [9.77,9999].
  • Round up instead of rounding half up, as these number are to show memory usage, rounding down or any halfway rounding won't make sense.
Suggested examples
PROCESS_MEGABYTES: |99999 | K(               99999) = K(99999)
PROCESS_MEGABYTES: |  98M | K(              100000) = K(100000)
PROCESS_MEGABYTES: |  99M | K(              101376) = M(99)
PROCESS_MEGABYTES: | 100M | K(              101377) = M(99) + K(1) # Round Up
PROCESS_MEGABYTES: | 999M | K(             1022976) = M(999)
PROCESS_MEGABYTES: |0.98G | K(             1022977) = M(999) + K(1)
PROCESS_MEGABYTES: |0.98G | K(             1023999) < M(1000)
PROCESS_MEGABYTES: |0.98G | K(             1024000) = M(1000)
PROCESS_MEGABYTES: |0.99G | K(             1038090) < G(0.99)
PROCESS_MEGABYTES: |1.00G | K(             1038091) > G(0.99) # Round Up
PROCESS_MEGABYTES: |1.00G | K(             1047552) = M(1023)
PROCESS_GIGABYTES: |1.00G | K(             1048576) = M(1024)
PROCESS_GIGABYTES: |1.01G | K(             1048577) = M(1024) + K(1) # Round Up
PROCESS_GIGABYTES: |9.99G | K(            10475274) < G(9.99)
PROCESS_GIGABYTES: |10.0G | K(            10475275) > G(9.99) # Round Up
PROCESS_GIGABYTES: |10.0G | K(            10485760) = G(10)
PROCESS_GIGABYTES: |10.1G | K(            10485761) = G(10) + K(1) # Round Up
PROCESS_GIGABYTES: |99.9G | K(           104752742) < G(99.9)
PROCESS_GIGABYTES: | 100G | K(           104752743) > G(99.9) # Round Up
PROCESS_GIGABYTES: | 100G | K(           104857600) = G(100)
PROCESS_GIGABYTES: | 999G | K(          1047527424) = G(999)
PROCESS_GIGABYTES: |0.98T | K(          1047527425) = G(999) + K(1) # Round Up 
PROCESS_GIGABYTES: |0.98T | K(          1048576000) = G(1000)
PROCESS_GIGABYTES: |0.99T | K(          1063004405) < T(0.99)
PROCESS_GIGABYTES: |1.00T | K(          1063004406) > T(0.99) # Round Up 
PROCESS_GIGABYTES: |1.00T | K(          1073741823) < G(1024)
     LARGE_NUMBER: |1.00T | K(          1073741824) = G(1024)

@BenBE
Copy link
Member

BenBE commented Oct 14, 2020

I think rounding-wise we should care at most about the two most significant units. Thus G(9)+M(999)+K(1) should be treated as either G(9)+M(999) (always truncate) or G(9)+M(1000) (always round upwards).

For this matter G(10) + K(1) -> G(10) + M(1) (G(10) + M(1) would be left unchanged)

After this normal rounding rules should be applied for display.

For values in front of the decimal point above 1000 the next bigger unit should be used, utilizing binary prefixes for conversion. Exception should be K(100000) which should be the first value to be displayed using M (below that use K).

@Explorer09
Copy link
Contributor

@BenBE Just for the info, the reason I vote for rounding upward (and never truncate) is that I don't wish to give users a false estimate on how much memory is needed for an application.

For example, the display of "4.00G" should mean the app uses close to but not more than 4 GiB (there would always be memory bytes unused due to allocation/paging/alignment overheads).
If the apps allocates one page greater than 4 G, I expect the display be "4.01G" as this would tell me that it won't totally fit in a 4 GB RAM.

I think memory usage numbers also round up in Windows Task Manager. Just for reference.

@Explorer09
Copy link
Contributor

I made a correction on the suggested example in my previous comment.
If the number would round up to 1000M for display, show 0.98G instead.
0.98G gives a clearer hint that the memory usage does not exceed 1G (gibibytes) and prevents people from falsely assuming 1000M = 1G.

PROCESS_MEGABYTES: |0.98G | K(             1022977) = M(999) + K(1)
PROCESS_MEGABYTES: |0.98G | K(             1023999) < M(1000)

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 15, 2020

@Explorer09 has raised these points:

  1. Unit Confusion
    There are two kinds of users:
    • Those who do understand computers, and thus appreciate the difference between SI and binary units.
    • Those who do not understand computers, and thus probably have no idea what a binary unit is.
  2. Range
    The range of numbers should be limited to 3 figures, so that a number like 1000M never occurs, thereby somehow avoiding the question of whether the units are based on powers of 1000.
  3. Ceiling
    Memory sizes should always be rounded up, thereby indicating a dependable minimum of memory required.

Here are my thoughts on these points:

  1. Unit Confusion
    A terminal-based program should be biased towards users who do understand computers, especially when the purpose of the program is to present the state of the computer.

  2. Range
    The more precise a number, the better; as much as practical, it is important to present reality to the user, and then let the user decide what to do with the information. With 4 digits, more information can be displayed; the number 1001M is more informative than 0.98G.

    Moreover, I do not believe that limiting the range helps to thwart unit confusion; someone can still look at 0.98G and think:

    • That's 980 MB

    • That's just under 1000 MB

    rather than:

    • Ummm... Where's my calculator? How many MiBs is that?

    • That's 1003.52 MiB.

    • I wish all of humanity just switched to using hexidecimal; it's a darn shame humans don't have 16 digits on their hands.

  3. Ceiling
    Always rounding up makes a lot of sense; it's the most practical way to use memory calculations.

    However, there is also virtue in trying to provide a number that is as close to reality as possible; if you were always to round up, then G(100) + K(1) would become 101G, but that's an overhead of nearly 1 GiB! Besides, surely someone who is trying to determine memory usage for the purpose of allocation would not only seek more exact data from some other source, but would in fact be a computer enthusiast who knows the difference between SI and binary units.

    Nevertheless, there is a potential solution that satisfies everyone and yet will also irritate everyone; we are already discussing novel ways to present the numbers (such as GongS notation, where the unit replaces the decimal point), so why not go a little further and try to encode the rounding as well:

    • Round Up
      An uppercase unit is used: G(100) + M(512) (that is, G(100.5)) is 101G (or 100G5 in GongS notation).
      Similarly: G(100) + M(563) + K(205) (that is, G(100.550000...)) is 101G (or 100G6 in GongS notation).

    • Round Down
      A lowercase unit is used: G(100) + M(511) (that is, G(100.4990...)) is 100g (but still 100G5 in GongS notation).
      Similarly: G(100) + M(563) + K(204) (that is, G(100.54999...)) is 101G (or 100g5 in GongS notation).

    In this way, the user gets the number that is closest to reality, but still knows whether (or not!) it is necessary to round up for the purposes of allocating enough memory.

@BenBE
Copy link
Member

BenBE commented Oct 15, 2020

  • GongS

Where comes the GongS from? Couldn't find a source for that name? Closest I found was RKM from IEC 62 …

  • I wish all of humanity just switched to using hexidecimal; it's a darn shame humans don't have 16 digits on their hands.

Just remove one finger from each hand and hexadecimal becomes surprisingly practical …

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 15, 2020

@Explorer09
Copy link
Contributor

  1. I think decimal and binary unit confusion would be more often seen, than the requirement of maximizing precision. You may disagree with me, but I do see Windows Task Manager goes with the display like 0.98G and not 1000M.

  2. Another reason I put precision requirement second is that it would be much simpler to display a wider field if user needs an extra digit of precision, than to "exploit" the five-character width limit trying to squeeze more information in there.

  3. I'm personally not a fan of what you call GongS notation. Although I am able to interpret 100G5 as 100.5G, the former notation looks too technical for a more casual user to understand. And for exabytes we can create an ambiguity: 50E5 (would it be 50.5 EiB, or 50 × 10^5, or hexadecimal number that equals to 20709)

@BenBE
Copy link
Member

BenBE commented Oct 15, 2020

  • I think staying with consistent for the displayed precision is key. I.e. as long as we stay with one notation stick to it we should be fine. In that regard the current implementation is horrible, as for some values it does use binary conversion, for others it doesn't.

  • As for the display of values above 1000 I'm not quite sure if I prefer to keep them showing as 1000 or rounded up to 0.98 of the next bigger unit. If sticking with them where do you make the cut than? 1024 just to avoid showing 0.something?

  • The RKM notation ("GongS notation") is a good compromise between having compact display, high information density and having the column visually parseable. It fixes some of the shortcomings of the current two-color implementation that was extended from the existing *mmkkk" display of numbers, namely it allows for visual clues which is the largest unit used with at the same time having that unit in the proper color. The current implementation breaks here by having e.g. 5.04G where the actual GB part is one color, but the unit scale is another. With RKM notation the same value has "5G04" and thus both the integer numeric value and the unit in the same color.

  • That said: I strongly prefer to keep the color separation for large values.

  • While 4E200 might look ambiguous with E notation at first glance, it should be quite clear from context that it's EiB. For comparision: default printf behaviour is using lowercase e for scientific notation. Also 50E5 has two issues: first the column aims for 4 significant digits, thus you'd more likely have "50E50" or something which would be quite "random" in its exponent to the user. The second is with scientific notation (E notation) you preferably have only one single digit before the decimal separator (I know there are exceptions when engineers use it, but …).Last: The different coloring on "**50E" would visually break the number focusing your view onto the 50E part.

@Explorer09
Copy link
Contributor

@BenBE
I would suggest memory usage larger than 100M (97.7 MiB) should display with just one color. If the number is 1GiB or larger, show the number in "gigabytes" color, if it's less than 1GiB (even when it's rounded up to 1GiB) show in "megabytes" color. That would make the visual simple.
And one key for accessibility is not to convey important information using the colors alone.
The utility of showing extra precision or having two-color display (for digit separation) could be offset by the base-1000 and base-1024 confusion. In other words, it might not be useful as you expect, IMHO.

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 16, 2020

When to use fractions depends on the precision.

With three figures after the decimal point, the smallest step (0.001) in the fractional notation is slightly larger than the smallest step (1) in the next lower unit:

0.001 * 1024 = 1.024

Therefore, you get the best precision by avoiding the switch to a higher unit; you should count as long as possible in the lowest possible unit; every coefficient should be maxed out to 9999 before switching to fractional notation.

You could get more precision by replacing both the decimal point and the leading zero with the unit:

0.0001G = G0001

Then it becomes a little less intuitive about when it makes sense to transition to the next higher unit, in general; however, the result actually satisfies @Explorer09's desire not to display a value >= 1000:

  • K is atomic; it must be counted all the way up to 99999.

  • M will start at 97G66.

  • M will end at 999M9, in which the smallest step is 102.4K (0.1 * 1024K). After this point, you'd get better precision by switching to the next higher unit, rather than counting whole mebibytes.

  • G will start at G9766; the smallest step of this notation is 104.8576K (0.0001 * 1024 * 1024K).

  • G will end at 999G9, in which the smallest step is 102.4M (0.1 * 1024M). After this point, you'd get better precision by switching to the next higher unit, rather than counting whole gibibytes.

  • T will start at T9766; the smallest step of this notation is 104.8576M (0.0001 * 1024 * 1024M).

  • T will end at 999T9, in which the smallest step is 102.4G (0.1 * 1024G). After this point, you'd get better precision by switching to the next higher unit, rather than counting whole tebibytes.

@BenBE
Copy link
Member

BenBE commented Oct 16, 2020

I don't like skipping the leading 0. Thus I much prefer 0T977 over T9766 even if the first looses a bit of precision.

@ghost
Copy link

ghost commented Oct 16, 2020

What just...happened?
My suggestion is just use the first 3 digits from the value and insert k/M/G etc at the position it should be, instead of the 4 digits you are suggesting (since 1k=1000).
Of course it would be base 10.
It would be like:
1, 10, 100, 1000, 10k0, 100k, 1M00, 10M0, 100M, 1G00 and so.
It is really simple. You guys make it complex for no reason.

@BenBE
Copy link
Member

BenBE commented Oct 16, 2020

@pthfdr-42 That's basically what we're discussing, as the k/M/G/… prefixes are usually used as KiB/MiB/GiB/… (i.e. scaling 1024) when dealing with amounts of memory, thus using 1000 for scaling is at least unexpected (and doesn't even align with what the current implementation* does).

What the current discussion is mostly still working out are 1) the best (and most consistent) points to do the transition from one unit to the next bigger one and 2) whether to skip leading zero or not. Apart from those open questions I think we already got a rough consensus on where it should go.

*That's on a completely different page. Don't look for any consistency there.

@mfwitten
Copy link
Contributor Author

mfwitten commented Oct 16, 2020

You shouldn't ever have 0T977; with only 3 numbers after the decimal point, you'd be better off counting to 9999G, and then switching to 9T765.

@ghost
Copy link

ghost commented Oct 16, 2020

@BenBE Well, if the 1000-vs-1024 thing is important, then just give an option in display settings.
Again, in my suggestion, the width is 4 instead of 5 (to save space) since there are 3 digits in every tier. Add the unit (k, M, G) and you get 4 characters.
For example:
32GiB is 34,359,738,368 bytes, which would be displayed as "34G3" in base 1000 and "32G0" in base 1024.
32GB is 32,000,000,000 bytes, which is 29.8GiB, and it would be displayed as "32G0" in base 1000 and "29G8" in base 1024.
About leading zeroes (in baee 1024), my opinion is to use "xxxM" if less than 1000M, and "0Gxx" if between 1000M and 1024M.

@birdie-github
Copy link

birdie-github commented Oct 17, 2020

I for one don't want to see the base shown in the results at all as it just breaks comprehension completely and overloads you with unnecessary information. Either let's have an option to switch between bases or just stick to 1024. I prefer the latter because RAM has been historically in 2^10 units.

@mfwitten Please please add an option to hide zeros from the output.

@BenBE
Copy link
Member

BenBE commented Oct 17, 2020

@birdie-github State of discussion currently is for using 1024 exclusively. The old code wasn't consistent in this regard, thus there will be some minor tweaks to fix these.

@ghost
Copy link

ghost commented Oct 18, 2020

@birdie-github I personally prefer adding an option to toggle 1000/1024, with 1024 being the default. This clarifies lots of things (I do not know whether it is 1000 or 1024).
Lots of people prefer base 1000 (particularly when they use the entire number like 1,000,000,000 in places like the file explorer.
PS: The I/O speed should also use this notation instead of the long "MB/s". When you have no permission, display "----" instead.

@michael-o
Copy link

The general rule for volatile memory (RAM, cache) is to use IEC binary prefix while persistent storage (disks, SSDs, etc), always use SI decimal prefixes. So 1 GB is always 1 000 000 000 GB and nothing else. The BIPM forbids the abuse of SI prefixes for binary use.

@BenBE BenBE added the needs-discussion 🤔 Changes need to be discussed and require consent label Sep 4, 2021
@almson
Copy link

almson commented Dec 24, 2021

The general rule for volatile memory (RAM, cache) is to use IEC binary prefix while persistent storage (disks, SSDs, etc), always use SI decimal prefixes. So 1 GB is always 1 000 000 000 GB and nothing else. The BIPM forbids the abuse of SI prefixes for binary use.

Yeah, maybe, but eg Kubernetes supports both for specifying RAM limits. (It's my current use case for keeping a careful eye on RAM usage.)

I used to be a defender of the power-of-2 prefixes, but I've come to realize that they were useful when RAM was tiny and data structures (which were often size with powers-of-2 for slightly easier addressing) were few in number. Now that we're administering machines with gigabytes of RAM and even programmers rarely allocate power-of-2 data structures, binary prefixes only offer pain instead of convenience.

@almson
Copy link

almson commented Dec 24, 2021

Also please display the K prefix. It's confusing that htop omits it.

And 100G5? Are you guys crazy? This is a general-purpose tool used by millions of people. Cut it out!

@michael-o
Copy link

The general rule for volatile memory (RAM, cache) is to use IEC binary prefix while persistent storage (disks, SSDs, etc), always use SI decimal prefixes. So 1 GB is always 1 000 000 000 GB and nothing else. The BIPM forbids the abuse of SI prefixes for binary use.

Yeah, maybe, but eg Kubernetes supports both for specifying RAM limits. (It's my current use case for keeping a careful eye on RAM usage.)

I used to be a defender of the power-of-2 prefixes, but I've come to realize that they were useful when RAM was tiny and data structures (which were often size with powers-of-2 for slightly easier addressing) were few in number. Now that we're administering machines with gigabytes of RAM and even programmers rarely allocate power-of-2 data structures, binary prefixes only offer pain instead of convenience.

Allocating in power 2 is advisable if you need proper alignment of data in memory.

@almson
Copy link

almson commented Dec 24, 2021

Allocating in power 2 is advisable if you need proper alignment of data in memory.

No, not exactly. In any case, if you're an embedded or low-level programmer who cares about these things, then you take out your calculator. It's time for users to stop having to put up with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-discussion 🤔 Changes need to be discussed and require consent
Projects
None yet
Development

No branches or pull requests

7 participants