support UInt & BigInt in TOML #47903

Roger-luo · 2022-12-15T05:37:31Z

This partially implements @StefanKarpinski's option 2 here: JuliaLang/TOML.jl#46 (comment), with the following convention

all numbers below UInt64 and Int64 are parsed as UInt64 or Int64 for compatibility
numbers above UInt64 and Int64 will be parsed accordingly to 128 or BigInt

resolves JuliaLang/TOML.jl#46

cc: @KristofferC

Roger-luo · 2022-12-15T15:30:34Z

Not sure if the test failure is due to this PR? The TOML tests pass locally for me

base/toml_parser.jl

stdlib/TOML/src/print.jl

StefanKarpinski

This is getting there, but I think if we're going to use native Julia types to represent integer literals, we should go all the way and use the same native types to represent integer literals as we would in Julia. In other words:

Decimal literals from -2^63:2^63-1 are Int64
Decimal literals from -2^127:2^127-1 are Int128
Larger decimal literals are BigInt

For binary, octal and hex literals, the type is determined by the number of digits the literal has (not the value of the literal, but the number of digits, d, that it is written with):

Binary literals are UInt$n where n = bits(2, d)
Octal literals are UInt$n where n = bits(8, d)
Hex literals are UInt$n where n = bits(16, d)

Where bits(b, d) = 8*nextpow(2, d*log2(b)/8). Except that UInt$n is replaced by BigInt once n is greater than 128 since we don't have unsigned types larger than that.

stdlib/TOML/test/readme.jl

vtjnash · 2023-01-04T17:50:01Z

I am not sure the type you get back should be quite that closely dependent on the file content. I would think Int/UInt/BigInt should suffice and be cleaner for the API. It is not quite like parsing source code, where the semantic intent is being conveyed intentionally also by the code length.

KristofferC · 2023-01-05T20:11:54Z

I am not sure the type you get back should be quite that closely dependent on the file content. I would think Int/UInt/BigInt should suffice and be cleaner for the API. It is not quite like parsing source code, where the semantic intent is being conveyed intentionally also by the code length.

Yes, having a whole big range of types just feels like it would cause more compilation etc for very little value. Agree with Int/UInt/BigInt and that's it.

vtjnash · 2023-01-05T20:53:07Z

Edit: I think the spec says Int64/UInt64, but yes

KristofferC · 2023-01-05T21:17:58Z

Spec only says at least Int64.

Roger-luo · 2023-01-05T21:19:50Z

Yes, having a whole big range of types just feels like it would cause more compilation etc for very little value. Agree with Int/UInt/BigInt and that's it.

should I remove the support of UInt128 and Int128 then?

I think the other reason we cannot support Uint8 etc. just that this will be quite breaking (they used to be returning Int), but UInt etc errors occasionally after serialization from Julia, so this is slightly less breaking I think.

StefanKarpinski · 2023-01-05T21:54:43Z

Well, it would only be breaking for unsigned integer literals, which I suspect are not widely used in our world. The point of using different types is to use the type as a way to preserve more information about how to print values (i.e. how many digits they need). If we're going to to Int64/UInt64/BigInt and print UInt64 as hex, then I'm not sure we want to use Julia's printing for those values. If I round trip a TOML file with 0xff in it and get back 0x00000000000000ff in my file, that's pretty annoying. So in that case, you've got to skip the leading zero bytes (pairs of digits rather than digits seems best).

stdlib/TOML/src/print.jl

StefanKarpinski · 2023-01-06T16:53:00Z

I also think that keeping the Int128 and UInt128 options here seems reasonable since those types are so much more efficient than BigInts. But I don't feel strongly about that either.

Roger-luo · 2023-01-07T21:11:58Z

OK, so I kept the original parsing, where

all integers below 64-bit are parsed into Int64 or if written in bin,oct and hex will be parsed into UInt64
integers between 64-bit - 128-bit are parsed as Int128 or UInt128
integers larger parsed as BigInt

for printing, signed integers are printed via Base.show (same as previous), unsigned integers
are printed as even hex without leading zeros.

I copy-pasted log2i from Yao to implement that even number of hex printing, it returns an integer strictly when taking an integer, I find it quite convenient when working with bits, but I'm not sure if people want me to move it into Base and let it sit beside log functions (in a separate PR)? It's just some convenient one-liner.

stdlib/TOML/src/print.jl

Roger-luo · 2023-01-11T20:18:33Z

@StefanKarpinski bump, do you have any other comments? Can we merge this?

stdlib/TOML/test/values.jl

StefanKarpinski

Looks good although I'm not seeing where this tests for printing of unsigned values.

StefanKarpinski

LGTM

KristofferC · 2023-01-12T21:55:22Z

Thanks for this @Roger-luo

This reverts commit d61cfd2.

From a type-stability perspective, this restores a lot of our behavior before JuliaLang#47903. As it turns out, 10 of the 11 uses of `parse_int` introduced in that PR are unnecessary since the TOML format already requires the parsed value to be within a very limited range. Note that this change does not actually revert any functionality (in contrast to JuliaLang#49576)

From a type-stability perspective, this restores a lot of our behavior before #47903. As it turns out, 10 of the 11 uses of `parse_int` (now called `parse_integer`) introduced in that PR are unnecessary since the TOML format already requires the parsed value to be within a very limited range. Note that this change does not actually revert any functionality (in contrast to #49576)

From a type-stability perspective, this restores a lot of our behavior before #47903. As it turns out, 10 of the 11 uses of `parse_int` (now called `parse_integer`) introduced in that PR are unnecessary since the TOML format already requires the parsed value to be within a very limited range. Note that this change does not actually revert any functionality (in contrast to #49576) (cherry picked from commit 59c3c71)

Roger-luo added 3 commits December 14, 2022 18:36

support parsing uint and long int

1f2c860

support UInt and BigInt in TOML

ada86da

fix whitespace

4e176a2

timholy mentioned this pull request Dec 15, 2022

Reduce invalidations when loading JuliaData packages #47889

Merged

giordano reviewed Dec 15, 2022

View reviewed changes

base/toml_parser.jl Outdated Show resolved Hide resolved

brenhinkeller added domain:bignums BigInt and BigFloat stdlib:TOML labels Dec 18, 2022

Roger-luo added 2 commits December 28, 2022 11:34

Merge branch 'master' into roger/toml-uint

fcf1be1

use eval

41d43a7

KristofferC reviewed Dec 28, 2022

View reviewed changes

stdlib/TOML/src/print.jl Outdated Show resolved Hide resolved

Roger-luo added 2 commits January 3, 2023 15:20

update printvalue

c09f0ea

Merge branch 'master' into roger/toml-uint

2b13317

Roger-luo requested review from KristofferC and giordano and removed request for KristofferC and giordano January 3, 2023 20:21

StefanKarpinski requested changes Jan 4, 2023

View reviewed changes

stdlib/TOML/test/readme.jl Show resolved Hide resolved

stdlib/TOML/test/readme.jl Show resolved Hide resolved

stdlib/TOML/test/readme.jl Show resolved Hide resolved

Merge branch 'master' into roger/toml-uint

92a7717

KristofferC approved these changes Jan 5, 2023

View reviewed changes

KristofferC reviewed Jan 6, 2023

View reviewed changes

stdlib/TOML/src/print.jl Outdated Show resolved Hide resolved

only print bytes in hex

0ea6781

dispatch to print_integer

f4b498f

Merge branch 'master' into roger/toml-uint

660d6b6

oscardssmith reviewed Jan 8, 2023

View reviewed changes

stdlib/TOML/src/print.jl Outdated Show resolved Hide resolved

Roger-luo added 3 commits January 8, 2023 18:22

Merge branch 'master' into roger/toml-uint

bbe8fde

use top_set_int

60310f3

Merge branch 'master' into roger/toml-uint

89068cc

Roger-luo requested a review from StefanKarpinski January 9, 2023 21:34

oscardssmith reviewed Jan 9, 2023

View reviewed changes

stdlib/TOML/src/print.jl Outdated Show resolved Hide resolved

oscardssmith reviewed Jan 9, 2023

View reviewed changes

stdlib/TOML/src/print.jl Outdated Show resolved Hide resolved

use ndigits

3e6b2f2

StefanKarpinski reviewed Jan 11, 2023

View reviewed changes

stdlib/TOML/test/values.jl Outdated Show resolved Hide resolved

StefanKarpinski reviewed Jan 11, 2023

View reviewed changes

Roger-luo added 3 commits January 11, 2023 21:21

use Stefan's ndigits & add tests

6c9798a

Merge branch 'master' into roger/toml-uint

5c71ad4

rm spacing

8bb8d18

StefanKarpinski approved these changes Jan 12, 2023

View reviewed changes

oscardssmith merged commit d61cfd2 into JuliaLang:master Jan 12, 2023

Roger-luo deleted the roger/toml-uint branch January 12, 2023 21:55

KristofferC added a commit that referenced this pull request Apr 30, 2023

Revert "support UInt & BigInt in TOML (#47903)"

ea1c01b

This reverts commit d61cfd2.

KristofferC mentioned this pull request Apr 30, 2023

Revert "support UInt & BigInt in TOML" #49576

Closed

topolarity mentioned this pull request Apr 4, 2024

TOML: Improve type-stability of BigInt/UInt support #53955

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support UInt & BigInt in TOML #47903

support UInt & BigInt in TOML #47903

Roger-luo commented Dec 15, 2022

Roger-luo commented Dec 15, 2022

StefanKarpinski left a comment •

edited

Loading

vtjnash commented Jan 4, 2023

KristofferC commented Jan 5, 2023 •

edited

Loading

vtjnash commented Jan 5, 2023

KristofferC commented Jan 5, 2023

Roger-luo commented Jan 5, 2023 •

edited

Loading

StefanKarpinski commented Jan 5, 2023 •

edited

Loading

StefanKarpinski commented Jan 6, 2023

Roger-luo commented Jan 7, 2023

Roger-luo commented Jan 11, 2023

StefanKarpinski left a comment

StefanKarpinski left a comment

KristofferC commented Jan 12, 2023

support UInt & BigInt in TOML #47903

support UInt & BigInt in TOML #47903

Conversation

Roger-luo commented Dec 15, 2022

Roger-luo commented Dec 15, 2022

StefanKarpinski left a comment • edited Loading

Choose a reason for hiding this comment

vtjnash commented Jan 4, 2023

KristofferC commented Jan 5, 2023 • edited Loading

vtjnash commented Jan 5, 2023

KristofferC commented Jan 5, 2023

Roger-luo commented Jan 5, 2023 • edited Loading

StefanKarpinski commented Jan 5, 2023 • edited Loading

StefanKarpinski commented Jan 6, 2023

Roger-luo commented Jan 7, 2023

Roger-luo commented Jan 11, 2023

StefanKarpinski left a comment

Choose a reason for hiding this comment

StefanKarpinski left a comment

Choose a reason for hiding this comment

KristofferC commented Jan 12, 2023

StefanKarpinski left a comment •

edited

Loading

KristofferC commented Jan 5, 2023 •

edited

Loading

Roger-luo commented Jan 5, 2023 •

edited

Loading

StefanKarpinski commented Jan 5, 2023 •

edited

Loading