Add Support for ABNF #3311

sanssecours · 2016-10-31T10:00:59Z

Hi,

this commit adds support for Augmented Backus–Naur form, a metalanguage used to specify language grammars. The language does not seem to be that popular on GitHub, but important enough to be used in popular repositories. A RFC document, describing the current version of the language, is available here.

If I should change anything in this pull request, then please just comment below. Thank you.

Kind regards,
René

Alhadis · 2016-11-01T12:32:16Z

👋 Hey @sanssecours, thanks for the pull requests. I'll address them both in this post, since they're so closely related.

I'd say there're enough results across GitHub for both languages for them to warrant addition:

Language	Files	Repos	Users
ABNF	2,308	828	706
EBNF	2,011	997	886

I wouldn't mind hearing @pchaigno and @arfon's thoughts too, though. :)

Also, a few other things:

You'll need to change the language-type of each entry to data, not programming. It's a notation for describing a language, as we both know, and not a real language per se.
You might find this grammar to be useful; it provides highlighting for both ABNF and EBNF.

arfon · 2016-11-01T12:34:31Z

I wouldn't mind hearing @pchaigno and @arfon's thoughts too, though. :)

Agreed. These look like they meet the required usage guidelines.

You'll need to change the language-type of each entry to data, not programming. It's a notation for describing a language, as we both know, and not a real language per se.

You might find this grammar to be useful; it provides highlighting for both ABNF and EBNF.

👍

Alhadis · 2016-11-01T12:43:14Z

Great! That settles it then. =)

One more thing: you'll also need to show where you acquired those samples, as we require them to be released under a permissive license. Specifically, one of these.

Lastly, I'd also ask you to remove the color fields for each language. As they fall under the data category, they won't be listed in language-statistic bars in repositories... which means the colours won't ever be seen by users.

sanssecours · 2016-11-01T16:34:28Z

You'll need to change the language-type of each entry to data, not programming. It's a notation for describing a language, as we both know, and not a real language per se.

Thanks for the information. I just changed the group. May I ask why ANTLR files are listed under programming then? As far as I know .gs4 files also describe grammars. Here is an example of an ANTLR file/grammar from the official website (Expr.g4):

grammar Expr;       
prog:   (expr NEWLINE)* ;
expr:   expr ('*'|'/') expr
    |   expr ('+'|'-') expr
    |   INT
    |   '(' expr ')'
    ;
NEWLINE : [\r\n]+ ;
INT     : [0-9]+ ;

You might find this grammar to be useful; it provides highlighting for both ABNF and EBNF.

Thanks for the info. I think I will incorporate the core rules in my version of the ABNF TextMate grammar too. The EBNF grammar seems to be a direct translation of Arne's grammar :o) so we should be covered there.

One more thing: you'll also need to show where you acquired those samples, as we require them to be released under a permissive license. Specifically, one of these.

All of the examples are from the English Wikipedia pages for ABNF and EBNF. I added a link to the respective articles at the top of each sample file. The files are covered under CC BY-SA 4.0, which sound permissive to me, although I think I also need to provide a link to the license and the name of the authors. Finding the authors of the code sounds like a complicated task, since so many people contribute to Wikipedia articles.

Should I just search for files that are covered by one of the licenses you linked to instead? If you think this option is more appropriate, is a comment that contains a link to the original source inside the files enough?

Alhadis · 2016-11-01T17:00:16Z

May I ask why ANTLR files are listed under programming then?

I'm not sure, as I'm unfamiliar with ANTLR; @larsbrinkhoff might know more about it than I do. But if it's really just another *BNF-like notation, its classification should definitely be changed. =)

All of the examples are from the English Wikipedia pages for ABNF and EBNF.

@pchaigno Do you know if the CC BY-SA license is appropriate? I've forgotten what the stance on Wikipedia-sourced samples is.

If you think this option is more appropriate, is a comment that contains a link to the original source inside the files enough?

No, I'm afraid not. The samples aren't for human readers: they're for the classifier. Linguist uses them as a basis for Bayesian classification, helping it identify languages based on the frequency of keywords and what-have-you. The bigger the reference material, the better.

Should I just search for files that are covered by one of the licenses you linked to instead?

You could, or you could write your own. Sometimes the latter is easier, particularly for data formats. :)

sanssecours · 2016-11-01T17:41:47Z

No, I'm afraid not. The samples aren't for human readers: they're for the classifier. Linguist uses them as a basis for Bayesian classification, helping it identify languages based on the frequency of keywords and what-have-you. The bigger the reference material, the better.

Sorry for the confusing description above 😀. I meant, is it enough to just specify the source at the top of a sample file? For example, if I include the file toml.abnf in the sample directory should I add the comment

; Source: https://github.com/toml-lang/toml/blob/abnf/toml.abnf

at the top of the file? Where exactly should I add the information about the source/license of a sample file?

You could, or you could write your own. Sometimes the latter is easier, particularly for data formats. :)

You are right. Writing a grammar that is valid according to the syntax of EBNF/ABNF is not that hard. Writing a grammar that also make sense semantically is not trivial though, especially if it should include a lot of the features of the metalanguage.

Alhadis · 2016-11-01T17:53:23Z

Sorry for the confusing description above 😀. I meant, is it enough to just specify the source at the top of a sample file?

Ah, I see what you mean. No, it doesn't work that way. The material will need to be explicitly released under a permissive license.

Where exactly should I add the information about the source/license of a sample file?

Here will do. =) It doesn't have to be in the codebase; just as long as we have a record of where/when it was acquired.

Writing a grammar that also make sense semantically is not trivial though, especially if it should include a lot of the features of the metalanguage

Well there's not a lot that can go wrong with *BNF, thankfully. ;)

pchaigno · 2016-11-01T17:56:55Z

You'll need to change the language-type of each entry to data, not programming. It's a notation for describing a language, as we both know, and not a real language per se.

@Alhadis In the end, programming is just a label. IMHO, what matters is what it means for a language to be a programming language on github.com. In this case, it means it will be in the statistics bar and a project can be tagged as a ABNF project.

@sanssecours Would you expect to see git repositories with only ABNF files or are they always included with some other files (as some kind of documentation/specification for instance)?

sanssecours · 2016-11-01T20:02:38Z

Here will do. =) It doesn't have to be in the codebase; just as long as we have a record of where/when it was acquired.

That sounds easy enough. Thank you for the information.

@sanssecours Would you expect to see git repositories with only ABNF files or are they always included with some other files (as some kind of documentation/specification for instance)?

I would expect that most repositories use ABNF/EBNF very sparsely, maybe as source of documentation for some programming/configuration language. There are however repositories, like this one, where the main file seems to be an ABNF grammar.

larsbrinkhoff · 2016-11-02T06:22:02Z

May I ask why ANTLR files are listed under programming then?

My impression is that ANTLR is similar to Yacc. They include both grammar rules and program code.

sanssecours · 2016-11-02T10:42:57Z

I replaced the sample files from Wikipedia with samples from GitHub. I added the source and the license at the top of the files. I hope that is okay.

The table below shows the sources and licenses of the files too.

Files	Source	License
toml.abnf	https://github.com/toml-lang/toml	MIT
tranquil.abnf	https://github.com/fjolnir/Tranquil	BSD 3 Clause License
grammar.ebnf	https://github.com/sunjay/lion	MIT
material.ebnf, object.ebnf, types.ebnf	https://github.com/io7m/jsom0	ISC

If there is anything left for me to do, then please just comment below. Thank you.

Alhadis

Just a few touch-ups needed, and then I think we're good to go. =)

Alhadis · 2016-11-02T10:46:12Z

lib/linguist/languages.yml

+  extensions:
+  - ".abnf"
+  interpreters:
+  - ex_abnf


This field is for languages that can be identified from interpreter directives. =)

For instance, a file with #!/usr/bin/env node in its first line can be identified as JavaScript by listing node as an interpreter.

Since that doesn't apply here, it's safe to leave this out.

Thank you for the explanation. This should be fixed now.

Alhadis · 2016-11-02T10:50:13Z

lib/linguist/languages.yml

+  interpreters:
+  - ex_abnf
+  tm_scope: source.abnf
+  group: ABNF


The group field here is redundant; we only use this to let one language affect a parent language's usage statistics. For example, GAS falls under the category of Assembly, so we want to have it count in the usage stats of Assembly languages across GitHub.

The name is slightly misleading; a more accurate name for this field might be parent_language.

In short, it's logistically impossible for anything to be its own parent. ;)

Thank you for the comment. I just removed the group fields.

pchaigno · 2016-11-02T10:53:06Z

samples/ABNF/tranquil.abnf

@@ -0,0 +1,61 @@
+; Source:  https://github.com/fjolnir/Tranquil
+; License: BSD 3 Clause License
+; Modified for standard compatibility by René Schwaiger


What did you need to modify here? We generally try to keep the files as-is (with the eventual mistakes, etc.) as they are used for tests and training.

The original file uses single quotes, which are not part of the standard as far as I can tell.

There was no need to have done that. This is the sort of in-the-wild discrepancy that's both natural and expected; and also part of the reason we need samples in the first place.

This is the sort of in-the-wild discrepancy that's both natural and expected; and also part of the reason we need samples in the first place.

I do not think that it makes that much sense to add incorrect files. Otherwise we could just add the following “ABNF code” as (part of) a sample file 😀:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

The ABNF TextMate grammar will also (correctly) mark parts of the original tranquil.abnf grammar as incorrect:

I do not think that it makes that much sense to add incorrect files.

It does when it is what people use in-the-wild. If you think this file is not representative of the usage of ABNF on github.com you can replace it ;)

Here's the relevant rule which I believe is applying the error highlighting:

[a-zA-Z][a-zA-Z0-9\-]*|(?<invalid>\S)

So it's basically saying "highlight anything that's not whitespace, and that isn't an ASCII word character or dash that commences with an alphabetic ASCII character.

May I add a comment that tranquil.abnf it is a non-standard-compliant file?

Recall that these files are for the classifier: 99% of the time, they remain unread by human beings.

Why?

Because it is a non-standard-compliant file 😀. I would not want someone to think that it represents a correct ABNF grammar.

However, I just removed tranquil.abnf. I hope that is a compromise that works for everyone.

Augmented Backus–Naur form ([ABNF][]) is a metalanguage used to specify language grammars. [ABNF]: https://en.wikipedia.org/wiki/Augmented_Backus–Naur_form

Alhadis · 2016-11-02T12:09:20Z

Looks good enough to me then, I guess. @arfon?

arfon · 2016-11-02T12:34:11Z

👍 looks great. Thanks @sanssecours.

This will be live in the next release of Linguist (later this week)

pchaigno · 2016-11-02T12:45:55Z

Thanks for your patience @sanssecours!

Alhadis self-assigned this Nov 1, 2016

Alhadis requested changes Nov 2, 2016

View reviewed changes

pchaigno reviewed Nov 2, 2016

View reviewed changes

Add support for ABNF

6a1423d

Augmented Backus–Naur form ([ABNF][]) is a metalanguage used to specify language grammars. [ABNF]: https://en.wikipedia.org/wiki/Augmented_Backus–Naur_form

Alhadis approved these changes Nov 2, 2016

View reviewed changes

arfon merged commit 4172390 into github-linguist:master Nov 2, 2016

Alhadis mentioned this pull request Jan 29, 2017

Add support for regular expression data #3441

Merged

Crissov mentioned this pull request Mar 15, 2017

Add support for EBNF #3075

Closed

github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for ABNF #3311

Add Support for ABNF #3311

sanssecours commented Oct 31, 2016

Alhadis commented Nov 1, 2016

arfon commented Nov 1, 2016

Alhadis commented Nov 1, 2016

sanssecours commented Nov 1, 2016

Alhadis commented Nov 1, 2016

sanssecours commented Nov 1, 2016

Alhadis commented Nov 1, 2016

pchaigno commented Nov 1, 2016

sanssecours commented Nov 1, 2016

larsbrinkhoff commented Nov 2, 2016

sanssecours commented Nov 2, 2016

Alhadis left a comment •

edited

Loading

Alhadis Nov 2, 2016

sanssecours Nov 2, 2016

Alhadis Nov 2, 2016

sanssecours Nov 2, 2016

pchaigno Nov 2, 2016

sanssecours Nov 2, 2016

Alhadis Nov 2, 2016

sanssecours Nov 2, 2016

pchaigno Nov 2, 2016

Alhadis Nov 2, 2016

sanssecours Nov 2, 2016

Alhadis Nov 2, 2016

Alhadis Nov 2, 2016

sanssecours Nov 2, 2016

Alhadis commented Nov 2, 2016

arfon commented Nov 2, 2016

pchaigno commented Nov 2, 2016

Add Support for ABNF #3311

Add Support for ABNF #3311

Conversation

sanssecours commented Oct 31, 2016

Alhadis commented Nov 1, 2016

arfon commented Nov 1, 2016

Alhadis commented Nov 1, 2016

sanssecours commented Nov 1, 2016

Alhadis commented Nov 1, 2016

sanssecours commented Nov 1, 2016

Alhadis commented Nov 1, 2016

pchaigno commented Nov 1, 2016

sanssecours commented Nov 1, 2016

larsbrinkhoff commented Nov 2, 2016

sanssecours commented Nov 2, 2016

Alhadis left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alhadis commented Nov 2, 2016

arfon commented Nov 2, 2016

pchaigno commented Nov 2, 2016

Alhadis left a comment •

edited

Loading