CatLearn is a code base for performing Gaussian Process machine learning on atomic systems. The code is modular in nature and each module has its own README, that will provide a more detailed description of what it does.
In general, there are modules for:
There are various fingerprint generators available. These typically take a list of ASE atoms object and return an array of features. The setup functions wrap around some predefined, or user written generators for various systems. The predefined functions are:
- adsorbate_fingerprint.py
- particle_fingerprint.py
- neighborhood_matrix.py
- standard_fingerprint.py
- general_fingerprint.py
The module contains functions to scale and optimize the feature space. The optimization routines include functions that will expand the space with various transforms and also reduce the space to form more compact representations with either elimination or extraction.
Ridge regression functions to generate reasonable linear models. This will typically give a good base level of predictive accuracy upon which to benchmark the more complex Gaussian process. The Gaussian processes functions are also located in this module. Along with Gaussian process regression, there are also functions for model optimization.
Model testing functions to assess likely error in the predictions.
General utilities to help build and test the models.