-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Home
The grammars-v4
repository is a collection of ANTLR4 grammars contributed by authors around the world. Grammars-v4 uses trgen, antlr4test-maven-plugin, a number of scripts in the _scripts directory, and Github Actions to ensure that all grammars on the tree build and parse input files properly with ANTLR4.
Each grammar has a directory of examples, which contains input files and the expected output from the parse (parse errors contained in .errors
files; parse tree of the input contained in .tree
files). Testing is performed across: Ubuntu, macOS, and Windows operating systems; Cpp (C++), CSharp (C#), Dart (Dart2), Go, Java, JavaScript, PHP, and Python3 targets; Bash and Powershell environments.
A core value of grammars-v4 is that any grammar downloaded from grammars-v4 will compile properly with ANTLR4, and has been validated against some example inputs.
There is no single license for the grammars; each grammar has its own license. Check inside the grammar files for licensing terms.
You are welcome to submit an issue ticket, and contributions to the grammars tree are also welcome.
If you add a grammar, you should add a desc.xml, an examples/ directory to test it, and a readme.md to document the grammar.
- You need to place the grammar in a directory that is appropriately named.
- In that directory (aka "the root directory for the grammar"), add .g4's, desc.xml, examples in directory
examples/
. Please include a readme.md with notes on the source for the grammar, version information, copyrights, authorship, etc. - You can make the grammar combined (one .g4) or split (two .g4's). If combined, the grammar and file name must be identical. Do not add "Parser" to the name for a combined grammar. If Split, the name of the lexer must end in "Lexer" and the parser end in "Parser".
- Actions or semantic predicates are ok if necessary for defining syntax. It is best if you use "target agnostic format".
- Make sure you have tested the grammar for Java.
If your PR breaks the existing tests, it will be rejected. Additionally, we ask that any incremental changes made to grammar files have examples contributed to the /examples
directory for that grammar to ensure that future changes to the grammars don't introduce regressions.
Look here
All grammars in the repository are formatted according to common rules. Formatting is tested for each PR and if that fails the PR is rejected. The tool used to format an ANTLR4 grammar is antlr-format. You need Node.js installed to run it.
All grammars in the repository contain formatting options that mirror the common rules. These options must not be changed in a PR, unless the maintainers change these rules and reformat the entire repository again.
New grammars usually do not contain these formatting options. You can either copy them from an existing grammar or let the antlr-format
tool add them for you. Consult the readme of the antlr-format terminal tool how to run it and read the Configuration section for details of the config file to use, to prepare your new grammar for a PR. Existing grammars don't need a config file (they have all options as comments), which look like:
// $antlr-format alignTrailingComments true, columnLimit 150, useTab false ...
There is an example at /tcpheader/
Use download-maven-plugin
<plugin>
<groupId>com.googlecode.maven-download-plugin</groupId>
<artifactId>download-maven-plugin</artifactId>
<version>1.4.0</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>wget</goal>
</goals>
<configuration>
<url>https://raw.githubusercontent.com/antlr/grammars-v4/master/arithmetic/arithmetic.g4</url>
<outputFileName>arithmetic.g4</outputFileName>
<outputDirectory>src/main/antlr4/com/khubla/antlr4example/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
This is the least desirable method to test a grammar because the Antlr4 website does not give very good instructions on how to write a program then build it. It is best that you use trgen to generate a complete, functioning program from templates.
However, if you insist, do the following:
- Clone this repo and cd to the directory of the grammar you want to use.
- Verify the desc.xml contains in the
<targets>
element that the target for this grammar works. - Copy any files from the directory named after the target to the root directory for the grammar. For example, for the cpp grammar, copy the files in the CSharp directory to the directory that contains .g4's.
- If a
transformGrammars.py
file exists, then runpython3 transformGrammars.py
. This performs modifications to the grammar that are specifically for the target. - Generate the parser and lexer recognizer files manually via
antlr4 -Dlanguage=<target> *.g4
, e.g.,antlr4 -Dlanguage=CSharp CPP14Lexer.g4 CPP14Parser.g4
. - Follow the steps in the webpage Runtime Libraries and Code Generation Targets to write a driver program.
- Build your program.
It is very likely that you will have problems. You will need to resolve these issues yourself.
- Install dotnet version 8.
- Install "antlr4-tools".
pip install antlr4-tools
. See https://github.com/antlr/antlr4-tools - Install target-specific support, e.g., G++, Dart, Go, etc.
- Install the Trash toolkit installed. See the documentation.
git clone https://github.com/antlr/grammars-v4.git
-
cd grammars-v4/<grammar-of-your-choice>
. E.g.,cd grammars-v4/java/java
. -
trgen
. This will create a driver for all implemented targets that work with the grammar. See the desc.xml file for this list. -
cd Generated-<target-of-your-choice>
. E.g.,cd Generated-CSharp
. - In a Bash prompt, type
make; make test
. Or, in a Powershell prompt, typepwsh build.ps1; pwsh test.ps1
. The scripts create temporary files used in the build. Usegit clean -f
to remove these files. - Tests create .errors and .tree files automatically. If you want, you can check these in for testing across targets and OSes.
- Clone grammars-v4.
git clone https://github.com/antlr/grammars-v4.git
- Make sure you have Maven installed. See the documentation.
-
cd grammars-v4
(the root directory), or to a grammarcd grammars-v4/java/java
. - Execute
mvn clean test
.
If you want to add a grammar to the repo, you will need to create two .XML files and place them in the directory containing your .g4's.
This file is for the Antlr Maven Tester. This tester only tests the Java target.
<project xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>abb</artifactId>
<packaging>jar</packaging>
<name>abb grammar</name>
<parent>
<groupId>org.antlr.grammars</groupId>
<artifactId>grammarsv4</artifactId>
<version>1.0-SNAPSHOT</version>
</parent>
<build>
<plugins>
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr4-maven-plugin</artifactId>
<version>${antlr.version}</version>
<configuration>
<sourceDirectory>${basedir}</sourceDirectory>
<includes>
<include>abbParser.g4</include>
<include>abbLexer.g4</include>
</includes>
<visitor>true</visitor>
<listener>true</listener>
</configuration>
<executions>
<execution>
<goals>
<goal>antlr4</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>com.khubla.antlr</groupId>
<artifactId>antlr4test-maven-plugin</artifactId>
<version>${antlr4test-maven-plugin.version}</version>
<configuration>
<verbose>false</verbose>
<showTree>false</showTree>
<entryPoint>module_</entryPoint>
<grammarName>abb</grammarName>
<packageName></packageName>
<exampleFiles>examples/</exampleFiles>
</configuration>
<executions>
<execution>
<goals>
<goal>test</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Make sure the <includes>
element only lists the top-level .g4's of the grammar. Do not include
"import" grammars. The <entryPoint>
must be the start rule, which should have EOF as the last symbol
in the right-hand side. The <packageName>
must be empty because the concept of a "package" does not
have meaning across all targets.
trgen is now used to test all grammars across all targets and OSes.
<?xml version="1.0" encoding="UTF-8" ?>
<desc xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
<targets>Antlr4ng;CSharp;Cpp;Dart;Go;Java;JavaScript;PHP;Python3;TypeScript</targets>
<inputs>examples/**/*.sys</inputs>
</desc>
The <targets>
element specifies all targets to test. If the grammar is target-specific, then you should try to write ports for
as many targets as possible. Alternatively, limit the list of targets to what the grammar can work with. The <inputs>
element
indicates the path to the input files to test. Globbing and wildcards are optional. You may need to add the <targets>
and
<inputs>
to a specific <test>
element if the grammar performance is very poor for certain targets.
Other elements you may want to use are:
-
<entry-point>
to specify a specific entry point. -
<grammar-files>
to specify top-level .g4 files. -
<grammar-name>
to specify a specific grammar to avoid confusion of which grammar to test if there are multiple top-level grammars.
When you create a PR, the grammar that changed is tested thoroughly. Please check the "static checks" in the Github Action for the PR to see what the tools find, including:
- Useless parentheses.
- Improperly formatted grammars.
- Ambiguities.