Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add some unicode function synonyms and infix operators #6582

Merged
merged 20 commits into from
May 5, 2014

Conversation

jiahao
Copy link
Member

@jiahao jiahao commented Apr 19, 2014

Closes #552.

@simonbyrne
Copy link
Contributor

Some other useful ones:

  • for in
  • for issubset (hey that one is already used in the help)
  • for intersect
  • for union
  • · for dot

@simonbyrne
Copy link
Contributor

@jiahao if you do use , please let us know, as it will break Distributions.jl

@jiahao
Copy link
Member Author

jiahao commented Apr 19, 2014

Oh, does Distributions use √ already?

@simonbyrne
Copy link
Contributor

In some of the constants, but they could easily be renamed.

@jiahao
Copy link
Member Author

jiahao commented Apr 19, 2014

I think the constants are fine. You'd have to write √(2) or 4 √ 16 with the proposed additions.

@StefanKarpinski
Copy link
Sponsor Member

The √ operator should be made to behave like the minus sign (but with different precedence probably) – unary prefix and binary infix – at which point it will break that code.

@my-little-repository
Copy link

Beware of the mismatch between △, which is used to denote symmetric difference of sets, and setdiff. Setminus would be a more appropriate name for the function setdiff. Also one could add a symdiff function to base and associate it to △.

math> {1,2,3}△{3,4,5} = {1,2,4,5}
julia> setdiff({1,2,3},{3,4,5})'
1x2 Array{Any,2}:
1 2
The unicode set minus character is ∖ aka \u2216 and can be confused with the backslash .
Side by side: ∖\

@simonbyrne
Copy link
Contributor

Yeah, I would agree on △: it's used for too many other things to make it standard (also, it would probably cause issues if we try to normalise, as its also the greek capital delta)

@jiahao
Copy link
Member Author

jiahao commented Apr 19, 2014

The commit message is wrong; I had set △ as symdiff. The nice thing about multiple dispatch is that this makes △ available as an infix operator for other purposes in other contexts.

Also (thank goodness), not even NFKC wants to normalize the upright white triangle to the Greek Delta.

julia> normalize_string("", :NFKC) == normalize_string("Δ", :NFKC)
false

@simonbyrne
Copy link
Contributor

Ah, I see. I guess the other problem is that it's not a unicode math operator: it's actually under geometric shapes

@my-little-repository
Copy link

Ah, symdiff exists, I didn't know it has been implemented. Nice. Note that there are other unicode characters that look like a triangle.
△ \u25b3 WHITE UP-POINTING TRIANGLE
Δ \u394 GREEK CAPITAL LETTER DELTA
∆ \u2206 INCREMENT
BTW, I am fine with △ as symdiff.

@jiahao
Copy link
Member Author

jiahao commented Apr 19, 2014

@simonbyrne Yes, I cheated. :)

@jiahao jiahao changed the title WIP: Add some unicode function synonyms and infix operators RFC: Add some unicode function synonyms and infix operators Apr 19, 2014
@jiahao
Copy link
Member Author

jiahao commented Apr 19, 2014

Done with laundry so I'm going to stop here for now. Happy 🚲🏠ing ⌂

⊂ and ⊄ seem useful to have around for user-defined relations, but otoh it seems odd to define infix operators that are never used in Base.
@stevengj
Copy link
Member

My feeling is that any Unicode codepoint from category Sm whose documented meaning and/or only common interpretation is as an infix operator (e.g. ) should be parsed as such, to allow the user to assign methods to them.

The only question would be precedence, but nearly all such operators have clear analogies to existing operators (e.g. and +, and <, or and &&) and hence it is reasonable to parse them with the same precedence.

@jiahao
Copy link
Member Author

jiahao commented Apr 22, 2014

Eventually supporting all Unicode-representable infix operators could be something worth doing, but it's a bit too much to type in manually. Also, as @JeffBezanson points out, having hundreds of operators means that the parser will have to be rewritten to be cleverer at identifying them beyond the current linear search method.

@stevengj
Copy link
Member

If you look at category Sm, most of the symbols are not unambiguously infix; there's probably a couple hundred unambiguous infix operators at most, and this is practical to type manually. It's pretty essential to go through them manually, actually, in order to intelligently categorize them.

But I agree that we would need to switch to (e.g.) a hash table instead of linear search.

@jiahao
Copy link
Member Author

jiahao commented Apr 22, 2014

there's probably a couple hundred unambiguous infix operators at most, and this is practical to type manually

So, about that UROP student...

(|\|>| |<\||)
(: |..|)
(+ - |.+| |.-| |\|| $)
(+ - |.+| |.-| |\|| $ ∩ ∪ △)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't ∩ have the precedence of & and have the precedence of |?

@stevengj
Copy link
Member

@jiahao, I made a quick pass through category Sm, and here are the symbols that seemed (a) unambiguously infix and (b) had a clear analogy to existing operators so that the precedence is clear:

* like operators: ∗∘∙ ∣∤ ⁄∩∧⅋≀⊓⊗⊘⊙⊚⊛⊠⊡⊼⋀⋂⋄⋅⋆⋇⋉⋊⋋⋌⋏⋒⟑⦸⦼⦾⦿⧵⧶⧷⧸⧹⨀⨂⨅⨇⨉⨯⨰⨱⨲⨳⨴⨵⨶⨷⨸⨻⨼⨽⩀⩃⩄⩋⩍⩎⩑⩓⩕⩘⩚⩜⩞⩟⩠⫛

+ like operators: ±∓∔∨∪−∸≂≏⊎⊔⊕⊖⊞⊟⊻⊽⋁⋃⋎⋓⧺⧻⨁⨃⊍⨄⨆⨈⨢⨣⨤⨥⨦⨧⨨⨩⨪⨫⨬⨭⨮⨹⨺⩁⩂⩅⩊⩌⩏⩐⩒⩔⩖⩗⩛⩝⩡⩢⩣﹢+

= like (comparison) operators: ∝∈∉∊∋∌∍∥∦∷∺∻∼∽∾≁≃≄≅≆≇≈≉≊≋≌≍≎≐≑≒≓≔≕≖≗≘≙≚≛≜≝≞≟≠≡≢≣≤≥≦≧≨≩≪≫≬≭≮≯≰≱≲≳≴≵≶≷≸≹≺≻≼≽≾≿⊀⊁⊂⊃⊄⊅⊆⊇⊈⊉⊊⊋⊏⊐⊑⊒⊜⊩⊬⊮⊰⊱⊲⊳⊴⊵⊶⊷⋍⋐⋑⋕⋖⋗⋘⋙⋚⋛⋜⋝⋞⋟⋠⋡⋢⋣⋤⋥⋦⋧⋨⋩⋪⋫⋬⋭⋲⋳⋴⋵⋶⋷⋸⋹⋺⋻⋼⋽⋾⋿⟈⟉⟒⦷⧀⧁⧡⧣⧤⧥⩦⩧⩪⩫⩬⩭⩮⩯⩰⩱⩲⩳⩴⩵⩶⩷⩸⩹⩺⩻⩼⩽⩾⩿⪀⪁⪂⪃⪄⪅⪆⪇⪈⪉⪊⪋⪌⪍⪎⪏⪐⪑⪒⪓⪔⪕⪖⪗⪘⪙⪚⪛⪜⪝⪞⪟⪠⪡⪢⪣⪤⪥⪦⪧⪨⪩⪪⪫⪬⪭⪮⪯⪰⪱⪲⪳⪴⪵⪶⪷⪸⪹⪺⪻⪼⪽⪾⪿⫀⫁⫂⫃⫄⫅⫆⫇⫈⫉⫊⫋⫌⫍⫎⫏⫐⫑⫒⫓⫔⫕⫖⫗⫘⫙⫷⫸⫹⫺﹤﹥<>

We also might consider treating some of the arrows as infix operators, since that is how they seem to be used in practice, although the precedence is less clear:

left/right arrows (precedence of = or ==?): ←→↔↚↛↠↣↦↮⇎⇏⇒⇔⇴⇶⇷⇸⇹⇺⇻⇼⇽⇾⇿⟵⟶⟷⟷⟹⟺⟻⟼⟽⟾⟿⤀⤁⤂⤃⤄⤅⤆⤇⤌⤍⤎⤏⤐⤑⤔⤕⤖⤗⤘⤝⤞⤟⤠⥄⥅⥆⥇⥈⥊⥋⥎⥐⥒⥓⥖⥗⥚⥛⥞⥟⥢⥤⥦⥧⥨⥩⥪⥫⥬⥭⥰⧴⬱⬰⬲⬳⬴⬵⬶⬷⬸⬹⬺⬻⬼⬽⬾⬿⭀⭁⭂⭃⭄⭇⭈⭉⭊⭋⭌←→

combination of comparisons and left-right arrows: ⥱⥲⥳⥴⥵⥶⥷⥸⥹⥺⥻

up/down arrows (precedence of ^?): ↑↓⇵⟰⟱⤈⤉⤊⤋⤒⤓⥉⥌⥍⥏⥑⥔⥕⥘⥙⥜⥝⥠⥡⥣⥥⥮⥯↑↓

…probably also ⇈⇊, although these are in category So rather than category Sm, and we might want to support other arrows in category So as infix operators too.

I probably missed some, but we can always add more later.

@simonbyrne
Copy link
Contributor

@StefanKarpinski It would be nice to have some sort of inbuilt infix operator for min and max. Up/down arrows would seem like a good option, for example ↓ = min, ↑ = max, and then ⇵ = minmax.

@stevengj
Copy link
Member

@simonbyrne, I'm skeptical of using arrows for this. If we need infix minimum and maximum, I would just make min and max infix operators.

@jiahao
Copy link
Member Author

jiahao commented Apr 22, 2014

having the up/down arrows be in the equivalence class of [^] makes sense vide Knuth's arrow notation a↑b for exponentation and a⇈b for tetration.

Also ∩ ∈ [&] and ∪ ∈ [|] make sense.

@stevengj
Copy link
Member

@JeffBezanson, does femtolisp have a hash-table implementation (or some other easy way to circumvent the performance problems of supporting lots of Unicode infix operators)?

@mlubin
Copy link
Member

mlubin commented Apr 24, 2014

+1

@JeffBezanson
Copy link
Sponsor Member

Yes, there is a hash table.

@JeffBezanson
Copy link
Sponsor Member

Merging this for now, as I believe it covers all the uncontroversial cases.

JeffBezanson added a commit that referenced this pull request May 5, 2014
RFC: Add some unicode function synonyms and infix operators
@JeffBezanson JeffBezanson merged commit 4c4e7cb into master May 5, 2014
@JeffBezanson
Copy link
Sponsor Member

I'm removing and because (1) these are too easy to confuse with the | operator, and (2) we don't want to add functions to base that only have unicode names.

@JeffBezanson
Copy link
Sponsor Member

Also is in the geometric shapes category, so it's not clear that it should be an operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unicode operators