Flint is a framework for building Fortran code analysis tools in Python. It includes a basic command line tool for linting and code checks, and a more general API for developing one's own custom tests.
More generally, flint provides an interface to a Fortran source code project within a Python environment.
Flint is in the early stages of development, but currently provides some basic functionality.
In most cases, flint can be installed with setup.py
python setup.py install --user
If you want to install it in your system directories (and have permission):
python setup.py install
This may not work for some users, and more elegant installation methods ought to be available in the future, depending on how well-received this project becomes.
Current functionality is described below. During this very early stage of development, features may change, expand, or be dropped without any notice.
Use flint
to invoke the CLI. The following tools have been implemented
with limited features.
flint report
Apply a generic linter and static analysis to a Fortran project.
The following tests are included:
- Trailing whitespace
- Indents with mixed tabs and spaces
- Tabs within statements
- Excessive line length (both with and without comments)
flint gendoc
- Generates a reST documentation file based on docstrings in the source code. Currently follows the Doxygen convention.
flint format
- In principle, this will provide a formatted version of an input source code. Currently, it returns each statement with whitespace and comments removed.
flint tag
Return the input files, with statements tagged based on type. For example, a
module
statement is tagged with the letterM
.This is primarily a debugging tool, but might be of interest to users.
Flint provides an interface to the details of the source code, which can be used to develop tools which are customized to your project.
To parse the source code, call the parse()
function with the top-level
directory of your project.
import flint
project = flint.parse('path/to/source')
This functions returns a Project
object, which itself contains several
objects representing the contents and attributes of the source code.
For example, the following code block will return a list of all the modules and its derived types.
for mod in proj.modules:
for dtype in mod.derived_types:
print(dtype.name)
For more examples, inspect the flint/tools
directory, which describe the
command line tools.
Flint is broken into three stages, which closely resemble compiler frontends.
The
Scanner
object takes an input stream and returns the "lexemes", the "words" of the grammar. No semantic meaning is attached to them at this stage.One important feature of
Scanner
is that it also preserves the non-semantic lexemes. Examples include grouped whitespace, endlines, and comments.Users would generally not use the
Scanner
since it is a component of theLexer
, which is described below.The lexemes are passed to the
Lexer
, which is structured as an iterator. It has three major responsibilities:- Lexemes are identified as either semantic or liminal, which is our term
for non-semantic tokens such as whitespace, comments, or statement
separators (
;
). - Lexemes are converted from lines to
Statements
. A statement may span many lines (&
), or a line may contain many statements (;
). TheLexer
will resolve these cases and return the next semanticStatement
. - Preprocessing is applied at this stage. Macro substitutions are applied, but the original macro name is preserved.
Each iteration of the lexer returns a
Statement
, which is alist
subclass containing theToken
lexemes.Each
Token
contains ahead
andtail
, which point to lists of the "liminals" inbetween the semantic lexemes. This includes whitespace (including endlines), line breaks (&
), statement terminators (;
), and comments. EachToken
preserves its original case, but uses lowercase for general operations, such as comparison tests or dictionary keys.There is also a
PToken
subclass from preprocessed content. These tokens display as the original unprocessed lexemes, but are evaluated as the postprocessed value. For example, macros appear unchanged but use their substituted value in comparison tests. Values from an#include
statement appear as empty strings but are returned as semantically valid statements.Although we call these "tokens", they are not quite equivalent to the tokens produced by a compiler's parser, since we do not yet classify them into, for example, identifiers or operators. There is some advantage in deferring this, since most Fortran keywords can also be used as identifiers.
As with the
Scanner
, most users will never need to interact with theLexer
, which is a component of theParser
described below.- Lexemes are identified as either semantic or liminal, which is our term
for non-semantic tokens such as whitespace, comments, or statement
separators (
Finally, the
Lexer
output is passed to theParser
, which interprets the semantic contents to recreate an abstraction of the source code and its components.This is where modules, subprograms, variables, and other content are organized into equivalent data structures which can be probed and traversed for further analysis.
The
Parser
is contained with theSource
objects, which represent abstractions of the source code (aka "translation units" in compiler-talk).If working as intended, this should be the only level at which the user is required to interact with the parser.
This is also the least developed part of flint, so at this point I will just say to watch this space for future work.
The "unknown unknowns" probably exceed the "known unknowns" at this stage, but we are aware of the following issues.
The Fortran expressions themselves remain unparsed beyond identification of its tokens. Further parsing such as AST generation is not yet attempted.
Expressions inside of an
#if
or#elif
statement are not parsed, and for simplicity are currently assumed to always be false.To fix this would require a full expression parser, which is not yet available.