Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808

jlperla · 2018-04-14T04:41:07Z

Looking at https://docs.julialang.org/en/latest/manual/unicode-input/#Unicode-Input-1 There are a few identifiers that would make excellent identifiers for linear algebra and probability DSLs.

U+1D7CE	𝟎	\bfzero	Mathematical Bold Digit Zero
U+1D7CF	𝟏	\bfone	Mathematical Bold Digit One
U+1D7D8	𝟘	\bbzero	Mathematical Double-Struck Digit Zero
U+1D7D9	𝟙	\bbone	Mathematical Double-Struck Digit One

Note that this is conservative in leaving as many other of the unicode numbers as invalid identifies. In particular, \bsanszero and \bsansone look similar, but are left as invalid identifiers for now.

The main use-case for these is to be able to add in automatically reshaping matrices/vectors of 1s and 0s into https://github.com/JuliaArrays/FillArrays.jl in the spirit of the UniformScaling operator, currently denoted by I. Of course, this library would not intend to lay claim to that notation, but would want to use it. The 𝟘 and 𝟙 might be useful for people who wish to use const 𝟙 = 𝟏 to match their latex notation, or could allow writing a new indicator functions, etc. I know I would use 𝟙(a > b) for that to match algebra.

The text was updated successfully, but these errors were encountered:

digital-carver · 2018-04-14T16:56:30Z

I can see the appeal of the idea, but I think there's too little benefit for the potential readability and maintanence costs with this. Between font variations and (anti-)aliasing and rendering choices and syntax highlighting, the distinctions between the different zeros (0 𝟎 𝟘) or ones (𝟙 1 𝟏) can get pretty blurry. The idea of potential gotchas in such basic entities as 0s and 1s (and the confused stackoverflow questions resulting from them) is not an appealing prospect.

jlperla · 2018-04-14T17:41:02Z

I think you can make that case about almost all unicode characters that have a similar ascii character. Whether it makes sense in a particular case or not is a very reasonable question, and library specific.

In libraries like ApproxFun.jl, they use symbols like 𝒟, which looks like a D matches the math notation of using script to denote differential operators.

The only difference with what I am suggesting is that (right now) library writers don't have the option to make an alias to variable names that start with the fancy number looking characters.

If this was changed, then the discussion could come to what you are bringing up: are introducing those aliases a good idea (since they never should be required). Your perspective is reasonable, but it may be domain specific

StefanKarpinski · 2018-04-14T20:19:39Z

There are three choices here:

Disallow all digit-variant characters entirely (what we do now).
Allow digit-variant characters to be used as letters, distinct from the digits they correspond to.
Allow digit-variant characters to be used as if they were simply the plain digit, i.e. make 𝟘 another way of writing 0.

The last option seems confusing and fairly pointless to me—unlike characters like μ and µ, which are different Unicode characters that look exactly alike, these are not likely to be somehow accidentally input when plain digits were intended. Why allow weird digits variants when literally every keyboard ever created has plain digits directly on it? The only way 𝟘 is likely to end up in a program is if someone intended to enter it.

The current behavior of disallowing digit variants entirely seems like a waste of potentially nice syntax. I have yet to encounter a font where these digits variants render and are not visually distinguishable from the corresponding digits.

That leaves option 2: allowing digit-variant characters to be used as letters, which is what this issue proposes. I can understand that people might now want to use these bindings, which is fine—in that case, don't use them. But why should we prevent people who want to from doing so? Especially given that the only other potential use for them is not really sensible.

digital-carver · 2018-04-14T20:39:35Z

I think you can make that case about almost all unicode characters that have a similar ascii character.

True, that's why I mentioned "such basic entities as 0s and 1s". 'Is this identifier a 𝒟 or a D' is a very different sort of question from 'is this thing here a literal or an identifier'. It's a small mental cost when going through a codebase, but such costs add up pretty quickly.

If this was changed, then the discussion could come to what you are bringing up: are introducing those aliases a good idea (since they never should be required). Your perspective is reasonable, but it may be domain specific

I'm a fan of DSLs and would in theory love to have custom infix operators (#16985) and even custom infix named functions, hoping the users use them wisely. But sometimes the guardrails have to be in the language, and in my opinion this is one of those cases.

I can understand that people might now want to use these, in which case, simply don't. But why should we prevent people who want to from doing so?

The same reason the codepoints were restricted in the first place (#5936) - code gets passed down and across teams and people, and sometimes it's more important to prevent "crazy things" being introduced by someone, than to provide a minor nicety.

dlfivefifty · 2018-04-15T12:28:30Z

As far as I can tell, any argument that this is confusing applies equally to ℯ (\euler). So whatever discussion led to changing e to ℯ in Base applies here.

JeffBezanson · 2018-04-15T17:44:46Z

Agreed; we're way past the point of having any sort of policy against potentially-confusable characters. I agree with Stefan that when fonts have 𝟘 and 𝟙 they tend to be more distinguishable than some other examples like e and ℯ.

StefanKarpinski · 2018-04-15T22:57:53Z

The same reason the codepoints were restricted in the first place

The reason to restrict code points was to allow for implementing sane uses of code points in the future without breaking code, not to prevent people from doing silly things. If people want to write unreadable code, they will, no matter what we do to try to prevent it.

I think the de facto policy with potentially-confusable characters is that we identify characters that are easily confused both on input and appearance so there's a real chance that someone may input one when they intended to input the other and not be easily able to tell that this is what has happened. The normal "e" versus Euler's "ℯ" fails this test on both counts: there's little chance that anyone will have input "ℯ" by accident when they meant "e" since "e" is on every keyboard and "ℯ" is on none; they also look fairly distinct in most fonts so even if someone managed to do this somehow, they'd be able to notice what's going on. The case of "μ" and "µ" satisfies this criterion since neither character is on a standard keyboard and some input methods give you one while others give you the other and they look identical so it's extremely hard to discover that this is what's going on after the fact. Applying this test to the "1" versus "𝟙" case leads to the same conclusion as "e" versus "ℯ"—i.e. that they should be considered distinct characters.

digital-carver · 2018-04-16T12:40:16Z

we identify characters that are easily confused both on input and appearance so there's a real chance that someone may input one when they intended to input the other and not be easily able to tell that this is what has happened

My concern was about later readability than about ambiguity during input, "code is read a lot more than it's written" and all that. But since this is probably going in, can we have it so that there's one canonical identifier zero (not multiple) to go alongside the one canonical literal 0 (and similarly for 1)? My vote is for the \bbzero and \bbone to be the allowed identifiers, since they're easier to distinguish visually from 0 and 1 (especially in the presence of syntax highlighting, which often makes a bold vs non-bold distinction not so clear).

dlfivefifty · 2018-04-16T13:29:05Z

I see no reason to limit this to just one, when so many of the "1"s are easily distinguished. No one is going to confuse any of the following for each other or for 1 and so at the very least they all should be legitimate identifiers: 𝟙, ₁, ❶, ⓵, ①, 1️⃣

sbromberger · 2018-09-14T23:20:11Z

ref: #10762

jlperla · 2019-08-06T19:13:22Z

@JeffBezanson @StefanKarpinski (cc @dlfivefifty ) I realized that a feature freeze is coming soon and was wondering if you would still support having a PR that implements this? It would be very nice to sneak into the 1.3 release.

dlfivefifty · 2019-08-06T22:21:10Z

For the record, 1.3 has a lot of exciting stuff in it already, and so postponing this to 1.4+ makes sense to me.

jlperla · 2019-08-06T22:24:36Z

Oh for sure. This would not be the highlight of the release by any means! But if it is a low "cost" and low probability of side effect issue, it would mean I can write some cool DSLs 6 months earlier.

JeffBezanson · 2019-08-08T19:33:34Z

Triage is ok with this.

StefanKarpinski · 2019-08-08T19:39:04Z

Explicitly, triage is ok with option 2: Allow digit-variant characters to be used as letters, distinct from the digits they correspond to. Now it merely needs an implementation.

JeffBezanson · 2019-08-12T18:43:18Z

Fixed by #32838

ararslan added the domain:unicode Related to unicode characters and encodings label Apr 14, 2018

jlperla mentioned this issue May 28, 2018

Lazy vectors of 1s and 0s JuliaArrays/FillArrays.jl#21

Open

JeffBezanson added the status:triage This should be discussed on a triage call label Aug 7, 2019

JeffBezanson added this to the 1.3 milestone Aug 8, 2019

JeffBezanson removed the status:triage This should be discussed on a triage call label Aug 8, 2019

StefanKarpinski added the status:help wanted Indicates that a maintainer wants help on an issue or pull request label Aug 8, 2019

ajozefiak mentioned this issue Aug 9, 2019

𝟎-𝟗 and 𝟘-𝟡 Identifiers #32838

Merged

JeffBezanson closed this as completed Aug 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808

Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808

jlperla commented Apr 14, 2018 •

edited

Loading

digital-carver commented Apr 14, 2018

jlperla commented Apr 14, 2018

StefanKarpinski commented Apr 14, 2018 •

edited

Loading

digital-carver commented Apr 14, 2018

dlfivefifty commented Apr 15, 2018

JeffBezanson commented Apr 15, 2018

StefanKarpinski commented Apr 15, 2018

digital-carver commented Apr 16, 2018

dlfivefifty commented Apr 16, 2018

sbromberger commented Sep 14, 2018

jlperla commented Aug 6, 2019

dlfivefifty commented Aug 6, 2019

jlperla commented Aug 6, 2019

JeffBezanson commented Aug 8, 2019

StefanKarpinski commented Aug 8, 2019

JeffBezanson commented Aug 12, 2019

Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808

Make 𝟏, 𝟎, 𝟙, 𝟘 into valid identifiers for DSLs #26808

Comments

jlperla commented Apr 14, 2018 • edited Loading

digital-carver commented Apr 14, 2018

jlperla commented Apr 14, 2018

StefanKarpinski commented Apr 14, 2018 • edited Loading

digital-carver commented Apr 14, 2018

dlfivefifty commented Apr 15, 2018

JeffBezanson commented Apr 15, 2018

StefanKarpinski commented Apr 15, 2018

digital-carver commented Apr 16, 2018

dlfivefifty commented Apr 16, 2018

sbromberger commented Sep 14, 2018

jlperla commented Aug 6, 2019

dlfivefifty commented Aug 6, 2019

jlperla commented Aug 6, 2019

JeffBezanson commented Aug 8, 2019

StefanKarpinski commented Aug 8, 2019

JeffBezanson commented Aug 12, 2019

jlperla commented Apr 14, 2018 •

edited

Loading

StefanKarpinski commented Apr 14, 2018 •

edited

Loading