-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for ABNF #3311
Add Support for ABNF #3311
Conversation
👋 Hey @sanssecours, thanks for the pull requests. I'll address them both in this post, since they're so closely related. I'd say there're enough results across GitHub for both languages for them to warrant addition:
I wouldn't mind hearing @pchaigno and @arfon's thoughts too, though. :) Also, a few other things:
|
Agreed. These look like they meet the required usage guidelines.
👍 |
Great! That settles it then. =) One more thing: you'll also need to show where you acquired those samples, as we require them to be released under a permissive license. Specifically, one of these. Lastly, I'd also ask you to remove the |
Thanks for the information. I just changed the group. May I ask why ANTLR files are listed under grammar Expr;
prog: (expr NEWLINE)* ;
expr: expr ('*'|'/') expr
| expr ('+'|'-') expr
| INT
| '(' expr ')'
;
NEWLINE : [\r\n]+ ;
INT : [0-9]+ ;
Thanks for the info. I think I will incorporate the core rules in my version of the ABNF TextMate grammar too. The EBNF grammar seems to be a direct translation of Arne's grammar :o) so we should be covered there.
All of the examples are from the English Wikipedia pages for ABNF and EBNF. I added a link to the respective articles at the top of each sample file. The files are covered under CC BY-SA 4.0, which sound permissive to me, although I think I also need to provide a link to the license and the name of the authors. Finding the authors of the code sounds like a complicated task, since so many people contribute to Wikipedia articles. Should I just search for files that are covered by one of the licenses you linked to instead? If you think this option is more appropriate, is a comment that contains a link to the original source inside the files enough? |
I'm not sure, as I'm unfamiliar with ANTLR; @larsbrinkhoff might know more about it than I do. But if it's really just another *BNF-like notation, its classification should definitely be changed. =)
@pchaigno Do you know if the CC BY-SA license is appropriate? I've forgotten what the stance on Wikipedia-sourced samples is.
No, I'm afraid not. The samples aren't for human readers: they're for the classifier. Linguist uses them as a basis for Bayesian classification, helping it identify languages based on the frequency of keywords and what-have-you. The bigger the reference material, the better.
You could, or you could write your own. Sometimes the latter is easier, particularly for data formats. :) |
Sorry for the confusing description above 😀. I meant, is it enough to just specify the source at the top of a sample file? For example, if I include the file
at the top of the file? Where exactly should I add the information about the source/license of a sample file?
You are right. Writing a grammar that is valid according to the syntax of EBNF/ABNF is not that hard. Writing a grammar that also make sense semantically is not trivial though, especially if it should include a lot of the features of the metalanguage. |
Ah, I see what you mean. No, it doesn't work that way. The material will need to be explicitly released under a permissive license.
Here will do. =) It doesn't have to be in the codebase; just as long as we have a record of where/when it was acquired.
Well there's not a lot that can go wrong with *BNF, thankfully. ;) |
@Alhadis In the end, @sanssecours Would you expect to see git repositories with only ABNF files or are they always included with some other files (as some kind of documentation/specification for instance)? |
That sounds easy enough. Thank you for the information.
I would expect that most repositories use ABNF/EBNF very sparsely, maybe as source of documentation for some programming/configuration language. There are however repositories, like this one, where the main file seems to be an ABNF grammar. |
My impression is that ANTLR is similar to Yacc. They include both grammar rules and program code. |
I replaced the sample files from Wikipedia with samples from GitHub. I added the source and the license at the top of the files. I hope that is okay. The table below shows the sources and licenses of the files too.
If there is anything left for me to do, then please just comment below. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few touch-ups needed, and then I think we're good to go. =)
extensions: | ||
- ".abnf" | ||
interpreters: | ||
- ex_abnf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is for languages that can be identified from interpreter directives. =)
For instance, a file with #!/usr/bin/env node
in its first line can be identified as JavaScript by listing node
as an interpreter.
Since that doesn't apply here, it's safe to leave this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. This should be fixed now.
interpreters: | ||
- ex_abnf | ||
tm_scope: source.abnf | ||
group: ABNF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The group
field here is redundant; we only use this to let one language affect a parent language's usage statistics. For example, GAS
falls under the category of Assembly
, so we want to have it count in the usage stats of Assembly languages across GitHub.
The name is slightly misleading; a more accurate name for this field might be parent_language
.
In short, it's logistically impossible for anything to be its own parent. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. I just removed the group fields.
@@ -0,0 +1,61 @@ | |||
; Source: https://github.com/fjolnir/Tranquil | |||
; License: BSD 3 Clause License | |||
; Modified for standard compatibility by René Schwaiger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did you need to modify here? We generally try to keep the files as-is (with the eventual mistakes, etc.) as they are used for tests and training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original file uses single quotes, which are not part of the standard as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no need to have done that. This is the sort of in-the-wild discrepancy that's both natural and expected; and also part of the reason we need samples in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the sort of in-the-wild discrepancy that's both natural and expected; and also part of the reason we need samples in the first place.
I do not think that it makes that much sense to add incorrect files. Otherwise we could just add the following “ABNF code” as (part of) a sample file 😀:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.
The ABNF TextMate grammar will also (correctly) mark parts of the original tranquil.abnf
grammar as incorrect:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that it makes that much sense to add incorrect files.
It does when it is what people use in-the-wild. If you think this file is not representative of the usage of ABNF on github.com you can replace it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the relevant rule which I believe is applying the error highlighting:
[a-zA-Z][a-zA-Z0-9\-]*|(?<invalid>\S)
So it's basically saying "highlight anything that's not whitespace, and that isn't an ASCII word character or dash that commences with an alphabetic ASCII character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I add a comment that tranquil.abnf
it is a non-standard-compliant file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recall that these files are for the classifier: 99% of the time, they remain unread by human beings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
Because it is a non-standard-compliant file 😀. I would not want someone to think that it represents a correct ABNF grammar.
However, I just removed tranquil.abnf
. I hope that is a compromise that works for everyone.
Augmented Backus–Naur form ([ABNF][]) is a metalanguage used to specify language grammars. [ABNF]: https://en.wikipedia.org/wiki/Augmented_Backus–Naur_form
Looks good enough to me then, I guess. @arfon? |
👍 looks great. Thanks @sanssecours. This will be live in the next release of Linguist (later this week) |
Thanks for your patience @sanssecours! |
Hi,
this commit adds support for Augmented Backus–Naur form, a metalanguage used to specify language grammars. The language does not seem to be that popular on GitHub, but important enough to be used in popular repositories. A RFC document, describing the current version of the language, is available here.
If I should change anything in this pull request, then please just comment below. Thank you.
Kind regards,
René