Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support UInt & BigInt in TOML #47903

Merged
merged 18 commits into from
Jan 12, 2023
Merged

Conversation

Roger-luo
Copy link
Contributor

This partially implements @StefanKarpinski's option 2 here: JuliaLang/TOML.jl#46 (comment), with the following convention

  • all numbers below UInt64 and Int64 are parsed as UInt64 or Int64 for compatibility
  • numbers above UInt64 and Int64 will be parsed accordingly to 128 or BigInt

resolves JuliaLang/TOML.jl#46

cc: @KristofferC

@Roger-luo
Copy link
Contributor Author

Not sure if the test failure is due to this PR? The TOML tests pass locally for me

base/toml_parser.jl Outdated Show resolved Hide resolved
@Roger-luo Roger-luo requested review from KristofferC and giordano and removed request for KristofferC and giordano January 3, 2023 20:21
Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting there, but I think if we're going to use native Julia types to represent integer literals, we should go all the way and use the same native types to represent integer literals as we would in Julia. In other words:

  • Decimal literals from -2^63:2^63-1 are Int64
  • Decimal literals from -2^127:2^127-1 are Int128
  • Larger decimal literals are BigInt

For binary, octal and hex literals, the type is determined by the number of digits the literal has (not the value of the literal, but the number of digits, d, that it is written with):

  • Binary literals are UInt$n where n = bits(2, d)
  • Octal literals are UInt$n where n = bits(8, d)
  • Hex literals are UInt$n where n = bits(16, d)

Where bits(b, d) = 8*nextpow(2, d*log2(b)/8). Except that UInt$n is replaced by BigInt once n is greater than 128 since we don't have unsigned types larger than that.

stdlib/TOML/test/readme.jl Show resolved Hide resolved
stdlib/TOML/test/readme.jl Show resolved Hide resolved
stdlib/TOML/test/readme.jl Show resolved Hide resolved
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 4, 2023

I am not sure the type you get back should be quite that closely dependent on the file content. I would think Int/UInt/BigInt should suffice and be cleaner for the API. It is not quite like parsing source code, where the semantic intent is being conveyed intentionally also by the code length.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Jan 5, 2023

I am not sure the type you get back should be quite that closely dependent on the file content. I would think Int/UInt/BigInt should suffice and be cleaner for the API. It is not quite like parsing source code, where the semantic intent is being conveyed intentionally also by the code length.

Yes, having a whole big range of types just feels like it would cause more compilation etc for very little value. Agree with Int/UInt/BigInt and that's it.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 5, 2023

Edit: I think the spec says Int64/UInt64, but yes

@KristofferC
Copy link
Sponsor Member

Spec only says at least Int64.

@Roger-luo
Copy link
Contributor Author

Roger-luo commented Jan 5, 2023

Yes, having a whole big range of types just feels like it would cause more compilation etc for very little value. Agree with Int/UInt/BigInt and that's it.

should I remove the support of UInt128 and Int128 then?


I think the other reason we cannot support Uint8 etc. just that this will be quite breaking (they used to be returning Int), but UInt etc errors occasionally after serialization from Julia, so this is slightly less breaking I think.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jan 5, 2023

Well, it would only be breaking for unsigned integer literals, which I suspect are not widely used in our world. The point of using different types is to use the type as a way to preserve more information about how to print values (i.e. how many digits they need). If we're going to to Int64/UInt64/BigInt and print UInt64 as hex, then I'm not sure we want to use Julia's printing for those values. If I round trip a TOML file with 0xff in it and get back 0x00000000000000ff in my file, that's pretty annoying. So in that case, you've got to skip the leading zero bytes (pairs of digits rather than digits seems best).

@StefanKarpinski
Copy link
Sponsor Member

I also think that keeping the Int128 and UInt128 options here seems reasonable since those types are so much more efficient than BigInts. But I don't feel strongly about that either.

@Roger-luo
Copy link
Contributor Author

OK, so I kept the original parsing, where

  • all integers below 64-bit are parsed into Int64 or if written in bin,oct and hex will be parsed into UInt64
  • integers between 64-bit - 128-bit are parsed as Int128 or UInt128
  • integers larger parsed as BigInt

for printing, signed integers are printed via Base.show (same as previous), unsigned integers
are printed as even hex without leading zeros.


I copy-pasted log2i from Yao to implement that even number of hex printing, it returns an integer strictly when taking an integer, I find it quite convenient when working with bits, but I'm not sure if people want me to move it into Base and let it sit beside log functions (in a separate PR)? It's just some convenient one-liner.

@Roger-luo
Copy link
Contributor Author

@StefanKarpinski bump, do you have any other comments? Can we merge this?

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good although I'm not seeing where this tests for printing of unsigned values.

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oscardssmith oscardssmith merged commit d61cfd2 into JuliaLang:master Jan 12, 2023
@Roger-luo Roger-luo deleted the roger/toml-uint branch January 12, 2023 21:55
@KristofferC
Copy link
Sponsor Member

Thanks for this @Roger-luo

KristofferC added a commit that referenced this pull request Apr 30, 2023
topolarity added a commit to topolarity/julia that referenced this pull request Apr 4, 2024
From a type-stability perspective, this restores a lot of our behavior
before JuliaLang#47903. As it turns out, 10 of the 11 uses of `parse_int` introduced
in that PR are unnecessary since the TOML format already requires the
parsed value to be within a very limited range.

Note that this change does not actually revert any functionality (in
contrast to JuliaLang#49576)
KristofferC pushed a commit that referenced this pull request Apr 5, 2024
From a type-stability perspective, this restores a lot of our behavior
before #47903.

As it turns out, 10 of the 11 uses of `parse_int` (now called
`parse_integer`) introduced in that PR are unnecessary since the TOML
format already requires the parsed value to be within a very limited
range.

Note that this change does not actually revert any functionality (in
contrast to #49576)
KristofferC pushed a commit that referenced this pull request Jun 13, 2024
From a type-stability perspective, this restores a lot of our behavior
before #47903.

As it turns out, 10 of the 11 uses of `parse_int` (now called
`parse_integer`) introduced in that PR are unnecessary since the TOML
format already requires the parsed value to be within a very limited
range.

Note that this change does not actually revert any functionality (in
contrast to #49576)

(cherry picked from commit 59c3c71)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsigned integers should be printed as hexadecimal
7 participants