Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for String(x::Symbol) = string(x) #18152

Merged
merged 1 commit into from
Sep 24, 2016

Conversation

dbeach24
Copy link
Contributor

@dbeach24 dbeach24 commented Aug 20, 2016

Bug fix for #16997
defines String(x::Symbol) = string(x)

(This one's against master!)

@tkelman
Copy link
Contributor

tkelman commented Aug 20, 2016

would have been less noise to rebase the existing PR #18120, but this is fine. will have to squash out that unnecessary merge commit

@musm
Copy link
Contributor

musm commented Aug 20, 2016

string or print_to_string ?

@@ -34,6 +34,7 @@ If you need to subsequently modify `v`, use `String(copy(v))` instead.
"""
String(v::Array{UInt8,1})

String(x::Symbol) = string(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be more efficient to do String(x::Symbol) = unsafe_string(Cstring(x)), which copies directly from the underlying data. (Seems to be about 3x faster in a quick benchmark.)

Copy link
Contributor

@TotalVerb TotalVerb Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this convert(::Type{String}, x::Symbol) for consistency? We already have convert(Symbol, ::String). It would be weird if Symbol(str) and s::Symbol = str works, and String(sym) works, but x::String = sym does not.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Want to make a PR for that change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should merge this first, and make another PR to replace it with convert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a single PR. Ideally @dbeach24 would update this PR, but he doesn't seem to be around so maybe we should fork his branch and push a new PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been rather busy. I'll work on it later this weekend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks @dbeach24.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalVerb If speed is a goal here, seems like:

convert(::Type{String}, x::Symbol) = unsafe_wrap(String, Cstring(x))

Will perform even better, since that is a zero-copy conversion. unsafe_string is unsafe because it reads from an arbitrary pointer, but it still performs a copy. In that sense, I suppose unsafe_wrap is doubly unsafe, since it also assumes that the underlying memory will not be modified at a later time.

However, since Symbol is an immutable type, I presume that unsafe_wrap is okay here. (But wonder is there is any possibility of a memory management issue because of this...?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, Symbols are currently never freed, so this is safe. But I don't know if it's a good idea to depend on this behaviour. @stevengj

@stevengj
Copy link
Member

Various places in the source code that use string(somesymbol) should also be updated to use String(somesymbol), in the interest of eating our own dogfood. A quick search finds at least the following cases:

@stevengj
Copy link
Member

Darn it, I screwed up my copy-and-paste in the above comment, so there are a number of duplicate lines that were supposed to be other occurrences of string(somesymbol). Oh well, we don't have to get every single spot.

@yuyichao yuyichao changed the title Djcb/bugfix 16997 Adds support for String(x::Symbol) = string(x) Aug 26, 2016
Copy link
Contributor

@TotalVerb TotalVerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. But this should probably be a method of convert (see comment).

@dbeach24
Copy link
Contributor Author

Just updated the PR with a new commit.
This implements convert(::Type{String}, x::Symbol).

@@ -62,6 +62,7 @@ convert(::Type{Array{UInt8}}, s::AbstractString) = String(s).data
convert(::Type{String}, s::AbstractString) = String(s)
convert(::Type{Vector{Char}}, s::AbstractString) = collect(s)
convert(::Type{Symbol}, s::AbstractString) = Symbol(s)
convert(::Type{String}, s::Symbol) = unsafe_wrap(String, Cstring(s))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very efficient conversion, but I need a ruling on the safety of doing this w.r.t. memory management of symbols.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this mean the .data field of the result of converting the same symbol to multiple different string objects would all point to the same data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would point to the same data, but the .data array would be different. They would be distinct .data arrays that share the same underlying data.

Copy link
Member

@stevengj stevengj Sep 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, this would mean that all the strings have data that points to the same memory.

My suggestion is to use unsafe_string here, not, unsafe_wrap. I don't think the extra efficiency of not making a copy is worth it for the danger of being able to modify Symbol contents through the data field of a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevengj I will follow your advice.

It seems that your view includes two kinds of immutability. e.g. Symbols are IMMUTABLE, while Strings are merely immutable... and we can't go trusting an immutable object to hold the definitive copy of IMMUTABLE data.

I am correctly understanding this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not like someone can't muck around with the Symbol contents anyway using Cstring directly, though. Mutating the data field of a string should be a no-go unless you make the string yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we all agree that mutating "someone else's" String is a no-go, then I'd be more inclined to stick with the unsafe_wrap version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that you're not supposed to mutate strings, and unsafe_wrap should be safe for symbol data if people are following the rules, but I just wanted to reduce the temptation here. I'm fine either way, however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably not enough of a performance difference to matter. I'll go with the "safer" unsafe version.

julia> symbols = [gensym() for i=1:10^6];
julia> f1(symbols) = for s in symbols unsafe_wrap(String, Cstring(s)) end
julia> f2(symbols) = for s in symbols unsafe_string(Cstring(s)) end

julia> @time f1(symbols)
  0.040287 seconds (2.00 M allocations: 91.553 MB, 14.31% gc time)

julia> @time f2(symbols)
  0.052429 seconds (2.00 M allocations: 106.812 MB, 15.83% gc time)

This is the underlying support for calling String(::Symbol).
Bug fix for issue JuliaLang#16997
@KristofferC
Copy link
Sponsor Member

Restarted travis. Previous error was the broadcast issue. LGTM when CI pass.

@tkelman
Copy link
Contributor

tkelman commented Sep 23, 2016

What is

From worker 7:       * broadcast            Error During Test
    From worker 7:    Test threw an exception of type ErrorException
    From worker 7:    Expression: broadcast(+,1.0,(0,-2.0)) == (1.0,-1.0)
    From worker 7:    type #+ has no field contents
    From worker 7:     in #35 at ./broadcast.jl:260 [inlined]
    From worker 7:     in _ntuple at ./tuple.jl:101 [inlined]
    From worker 7:     in ntuple(::Base.Broadcast.##35#36, ::Type{Val{2}}) at ./tuple.jl:94
    From worker 7:     in macro expansion at ./broadcast.jl:260 [inlined]
    From worker 7:     in broadcast_tup(::Base.#+, ::Tuple{Float64,Tuple{Int64,Float64}}, ::Type{Val{2}}, ::Int64) at ./broadcast.jl:259
    From worker 7:     in broadcast_c(::Function, ::Type{Tuple}, ::Float64, ::Vararg{Any,N}) at ./broadcast.jl:269
    From worker 7:     in broadcast(::Function, ::Float64, ::Tuple{Int64,Float64}) at ./broadcast.jl:340
    From worker 7:     in include_string(::String, ::String) at ./loading.jl:478
    From worker 7:     in include_from_node1(::String) at ./loading.jl:544
    From worker 7:     in macro expansion at ./util.jl:232 [inlined]
    From worker 7:     in runtests(::String) at /tmp/julia/share/julia/test/testdefs.jl:7
    From worker 7:     in (::Base.##619#621{Base.CallMsg{:call_fetch}})() at ./multi.jl:1425
    From worker 7:     in run_work_thunk(::Base.##619#621{Base.CallMsg{:call_fetch}}, ::Bool) at ./multi.jl:1005
    From worker 7:     in macro expansion at ./multi.jl:1425 [inlined]
    From worker 7:     in (::Base.##618#620{Base.CallMsg{:call_fetch},Base.MsgHeader,TCPSocket})() at ./event.jl:68

anyone seen that before?

@TotalVerb
Copy link
Contributor

@tkelman It's #18577.

@tkelman
Copy link
Contributor

tkelman commented Sep 23, 2016

Oh right, then I guess Kristoffer didn't restart all jobs.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Sep 23, 2016

Looks like I only restarded one of them. Oops. Redid it.

@kshyatt
Copy link
Contributor

kshyatt commented Sep 23, 2016

@KristofferC OKed this and CI passed so I'm going to merge.

@kshyatt kshyatt merged commit ac25c58 into JuliaLang:master Sep 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants