Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\Bbbone is not valid input #10762

Closed
dpsanders opened this issue Apr 7, 2015 · 22 comments
Closed

\Bbbone is not valid input #10762

dpsanders opened this issue Apr 7, 2015 · 22 comments
Labels
domain:unicode Related to unicode characters and encodings needs decision A decision on this change is needed

Comments

@dpsanders
Copy link
Contributor

I wanted to use \Bbbone (𝟙) as a variable name (on latest master) but it gives the following error:

julia> 𝟙
ERROR: syntax: invalid character "𝟙"

Other less common characters, such as
\mbfitsansvarpi, just above it in the Unicode input list do work.

Is this deliberate since it's "sort of a number"?

@dpsanders
Copy link
Contributor Author

I see that a𝟙 is a valid variable name, which strengthens my hypothesis, but I don't see any distinction being made in latex_symbols.jl at a glance.

@simonbyrne
Copy link
Contributor

Same with \Bbbzero (𝟘) all \Bbb digits.

@jiahao
Copy link
Member

jiahao commented Apr 7, 2015

There's a place in the parser that checks if a character is valid to place at the beginning of an identifier name.

@dpsanders
Copy link
Contributor Author

OK, I see, thanks. Maybe these symbols could be included as allowed?

@sbromberger
Copy link
Contributor

Does this mean that it might be possible to do 𝟙 = 2? That seems... ripe for abuse :)

@jiahao
Copy link
Member

jiahao commented Apr 7, 2015

Well, 𝟙 is quite commonly used for the identity matrix or the matrix of all ones. Being able to use it as a variable name is a valid use case.

@sbromberger
Copy link
Contributor

Having to squint to determine whether a = 1 means "assign the value one to a" or "assign some other variable to a" will make code maintenance a nightmare unless steps are taken to document the use of this variable, in which case you might as well use another variable.

@jiahao
Copy link
Member

jiahao commented Apr 7, 2015

"Get a better font" :)

@sbromberger
Copy link
Contributor

"Get a better font" :)

AWESOME. :) I don't really have a large dog in this fight: I can't imagine ever using this but I recall the security / phishing problems we had when unicode characters were allowed in URLs, and I imagine something similar can happen in Julia. It was a bad idea for URLs, and it seems like some of the same problems might be carried over here.

I can't wait to try this out though.

@jiahao
Copy link
Member

jiahao commented Apr 7, 2015

I don't think we should conflate this issue with #9744.

@JeffBezanson
Copy link
Sponsor Member

This character is numberlike and so is not allowed to start an identifer. However we don't parse it as a number either, hence the current error. I think this situation is reasonable.

You tend not to see fonts using "1" for "𝟙" (has anybody seen this?); you usually get a reasonable glyph or else some nasty replacement character.

I don't think it's the place of a programming language to try to ban characters. I acknowledge the security problems caused by unicode in URLs, but I find it circuitous logic to say that unicode should be disallowed in hopes of slightly decreasing the surface area for phishing attacks.

@JeffBezanson JeffBezanson added needs decision A decision on this change is needed domain:unicode Related to unicode characters and encodings labels Apr 7, 2015
jiahao added a commit that referenced this issue May 14, 2015
- Mathematical bold 0, 1 (U+1D7CE, U+1D7CF)
- Mathematical double-struck 0, 1 (U+1D7D8, U+1D7D9)

which are sometimes used to represent certain representations of additive
and multiplicative identities.

Closes #10762
jiahao added a commit that referenced this issue May 14, 2015
- Mathematical bold 0, 1 (U+1D7CE, U+1D7CF)
- Mathematical double-struck 0, 1 (U+1D7D8, U+1D7D9)

which are sometimes used to represent certain representations of additive
and multiplicative identities.

Closes #10762
@sbromberger
Copy link
Contributor

@JeffBezanson

I find it circuitous logic to say that unicode should be disallowed in hopes of slightly decreasing the surface area for phishing attacks.

That's not what I was suggesting - sorry for any misunderstanding. I think that the current approach (as I understand it) of not allowing "numberlike" characters to start identifiers is the correct one.

... but I see that a commit has already been made to allow this. I'll just go once more on the record that I think it's a bad idea, and will move on.

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@JeffBezanson
Copy link
Sponsor Member

The proposed change was not merged; seems this was decided against.

@simonbyrne
Copy link
Contributor

The other discussion seems to have gotten rather distracted, so I'm not sure it was ever really resolved.

FWIW, I would still be in favour of allowing these, as well as the various vulgar fractions (½, etc.)

@StefanKarpinski StefanKarpinski modified the milestones: 1.0, 0.6.0 Oct 26, 2016
@sbromberger
Copy link
Contributor

@simonbyrne just to clarify (and to refresh my memory) - is it your suggestion that the vulgar fractions would be "numberlike" in that they wouldn't be allowed to be (or to start) variable names?

@simonbyrne
Copy link
Contributor

Actually I'm not sure: I was just throwing them into the mix as they could be useful, but currently get parsed as an error. Ideally I would like to use them as

y = exp(½*x)

So I guess they should act like numbers, perhaps as rationals? (though in that case we should try to get some performance optimisations so that when used with floats they are inlined to the appropriate constants)

@sbromberger
Copy link
Contributor

sbromberger commented Oct 26, 2016

As long as they're constants (that is, they can't be arbitrarily redefined) I'd be in favor of this; but if users can assign them as variables (i.e., ½ = 0.25 or even ½ = "cheese"), I'd have some concerns. This is the way we get Mars landers to crash on entry :)

@JeffBezanson
Copy link
Sponsor Member

We could potentially parse the vulgar fractions as numeric literals. Unlike \Bbbone they seem to have totally unambiguous meanings.

@jiahao
Copy link
Member

jiahao commented Nov 1, 2016

We could potentially parse the vulgar fractions as numeric literals.

One possible argument against doing this is in writing code that caches computation of expensive divisions, e.g. naming a variable ⅓x that caches the division x/3. Note that we already allow to mean anything, even though it's convenient to cache the repeated use of a squared quantity.

@JeffBezanson
Copy link
Sponsor Member

Both the syntaxes 2x (multiply) and x2 (variable) exist, and we can continue that here. While is a variable name like x2, ²x is currently a syntax error. So we could have ⅓x multiply and x⅓ be a variable name. Granted it's not as natural a way to name the variable, but at least it provides some way to get both behaviors.

@JeffBezanson
Copy link
Sponsor Member

The only change I can imagine making here is allowing these characters in initial position (for identifiers, or perhaps as numeric literals in the case of fractions), in which case it will be non-breaking. Moving out of 1.0.

@Keno
Copy link
Member

Keno commented Aug 20, 2019

Fixed by #32838.

@Keno Keno closed this as completed Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants