Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make parse(T, "") and parse(T, " ") etc, more specific regarding whitespace and empty strings #20588

Merged
merged 5 commits into from
Feb 13, 2017

Conversation

Tetralux
Copy link
Contributor

Fixes #20587.

Before this PR:

julia> parse(Int, "", 16)
ERROR: ArgumentError: invalid base: base must be 2 ≤ base ≤ 62, got 0
Stacktrace:
 [1] tryparse_internal(::Type{Int64}, ::String, ::Int64, ::Int64, ::Int64, ::Bool) at ./parse.jl:63
 [2] parse(::Type{Int64}, ::String, ::Int64) at ./parse.jl:146

After this PR:

julia> parse(Int, "", 16)
ERROR: ArgumentError: input string is empty or only contains whitespace
Stacktrace:
 [1] tryparse_internal(::Type{Int64}, ::String, ::Int64, ::Int64, ::Int64, ::Bool) at .\parse.jl:62
 [2] parse(::Type{Int64}, ::String, ::Int64) at .\parse.jl:161

This message will also appear for parse(T, "") and parse(T, " "), etc.

@Tetralux Tetralux changed the title Make parse(T, "") and parse(T, " ") etc, more specific Make parse(T, "") and parse(T, " ") etc, more specific regarding whitespace and empty strings Feb 12, 2017
base/parse.jl Outdated
@@ -58,6 +58,10 @@ end

function tryparse_internal{T<:Integer}(::Type{T}, s::AbstractString, startpos::Int, endpos::Int, base_::Integer, raise::Bool)
_n = Nullable{T}()
if isempty(strip(s))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling strip will be too expensive here --- this function is very performance-critical, and most inputs don't need strip at all. I believe parseint_preamble returns a base of 0 if there is a problem; we should check for all-whitespace strings only in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have spotted that. Will commit shortly.

@Tetralux
Copy link
Contributor Author

Sorry that took a while. Took me farting around for an hour, trying to make it better for Bool, only to realise there was a much simpler way of doing it. >D

base/parse.jl Outdated
Nullable{Bool}()

substr = SubString(sbuff, startpos, endpos)
if count(isspace, sbuff) == len # all chars were whitespace
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with this code, but should this be checking substr instead of sbuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably add a test for that. >)

@Tetralux
Copy link
Contributor Author

Third time's a charm.

@tkelman
Copy link
Contributor

tkelman commented Feb 13, 2017

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

base/parse.jl Outdated
@@ -126,27 +130,49 @@ end

function tryparse_internal(::Type{Bool}, sbuff::Union{String,SubString},
startpos::Int, endpos::Int, base::Integer, raise::Bool)
len = endpos-startpos+1
p = pointer(sbuff)+startpos-1
null = Nullable{Bool}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the point of defining this variable. Why not write Nullable{Bool}() where you need it?

base/parse.jl Outdated

substr = SubString(sbuff, startpos, endpos)
if count(isspace, substr) == len # all chars were whitespace
raise && throw(ArgumentError("input string only contains whitespace"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit weird stylistically. Why not wrap both branches inside an if raise block?

base/parse.jl Outdated
Nullable{Bool}()

substr = SubString(sbuff, startpos, endpos)
if count(isspace, substr) == len # all chars were whitespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all(isspace, substr) would be simpler (if I'm not missing something).

base/parse.jl Outdated
if isnull(result)
throw(ArgumentError("cannot parse $(repr(s)) as $T"))
end
return get(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use unsafe_get since you have already checked isnull.

test/parse.jl Outdated
# Issue 20587
for T in vcat(subtypes(Signed), subtypes(Unsigned))
for s in ["", " ", " "]
# Without a base (handles things like "0x00001111", etc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove one space here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - from where?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 5 spaces instead of 4.

test/parse.jl Outdated

# Test `tryparse_internal` with part of a string
let b = " "
result = @test_throws ArgumentError get(Base.tryparse_internal(Bool, b, 7, 11, 0, true)) == true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== true is redundant (same below).


# With a base
result = @test_throws ArgumentError parse(T, s, 16)
exception_with_base = result.value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it clearer to always access result.value instead of defining exception_with_base (same below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined them separately so that it reminds you they are exceptions, and so that in the test-failed message you see which one it was, since let-expressions obscure the line number.

@Tetralux
Copy link
Contributor Author

So, the old parsing code for integers was forgiving of leading and trailing whitespace.
The old boolean parsing code was not.

My new integer parsing code is also forgiving of whitespace.
My new boolean parsing code is not.

Shall I make them both forgiving of it?

@JeffBezanson JeffBezanson merged commit 021e11e into JuliaLang:master Feb 13, 2017
@tkelman
Copy link
Contributor

tkelman commented Feb 14, 2017

either both should allow whitespace or neither should - maybe the latter if it's possible to get any performance benefits from that

@Tetralux
Copy link
Contributor Author

Tetralux commented Feb 14, 2017

@tkelman
With some naive benchmarking, it takes about 1ms 2ms more time to be forgiving of whitespace in both cases.
It also seems that float parsing is also forgiving of whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants