Skip to content
/ dlmmc Public
forked from justinalsing/dlmmc

Dynamical linear modeling (DLM) regression code for analysis of atmospheric time-series data.

License

Notifications You must be signed in to change notification settings

be557308/dlmmc

 
 

Repository files navigation

DLMMC

Dynamical Linear Modelling (DLM) regression code in python for analysis of time-series data. The code is targeted at atmospheric time-series analysis, with a detailed worked example (and data) included for stratospheric ozone, but is a fairly general state space model that can be applied or extended to a wide range of problems.

The core of this package is a suite of DLM models implemented in stan, using a combination of HMC sampling and Kalman filtering to infer the DLM model parameters (trend, seasonal cycle, auto-regressive processes etc) given some time-series data. To make the code as accessible as possible, I provide a step-by-step tutorial in python for how to read in your data, run the DLM model(s), and process the outputs to make nice plots. Once you've worked through this tutorial you should have all the tools you need to apply DLM to your own data!

Installation

The code is python3 and has the following dependencies (which can be installed using eg., pip install):

numpy
scipy
matplotlib
netCDF4
pystan

If you want to run multiple DLMs in parallel with MPI, you will also need openmpi and mpi4py (again easily done with pip).

Once you have downloaded the code from this repository and installed the dependencies, run the following script (make sure in python3):

python compile_stan_models.py

This pre-compiles all of the models on your machine, saves them in models/, and then you're ready to start DLMing!

Usage

Functionality

A detailed annotated tutorial walk-through of how to use the code is given in the jupyter notebook dlm_tutorial.ipynb -- this tutorial analyses stratospheric ozone time-series data as a case study. The notebook takes you step-by-step through the complete functionality of the code: loading in your own data, running the DLM model, and processing and plotting the results.

Running in parallel with MPI

It's often necessary to perform regression of a large number time-series (eg., over a grid of observations at different altitudes/latitudes/longitudes) and is advantageous to be able to run these in parallel. The python script dlm_lat_alt_mpi_run.py is a template for how to run the DLM code over a grid of time-series at different latitudes/altitudes in parallel using MPI, and save the results to a single netCDF file. This script has the additional dependency tqdm if you want it to work with a progress bar. Provided you have MPI working, you can run this script with the following command (using eg. 4 hyperthreaded processes, again make sure you run with python3):

mpirun --use-hwthread-cpus -np 4 python dlm_lat_alt_mpi_run.py

I recommend you run this with a very small number of samples first (eg iter=3, warmup=1) to check it runs through, before embarking on a long run.

Model descriptions

Mathematical descriptions of each of the DLM models implemented in this package can be found in the file models/model_descriptions.pdf. This file contains a concise description of the parameters of each model, their physical meanings, and how to refer to them in the code: make sure you have read and understand the model description before running a new model!

Citing this code

There is a JOSS paper in preparation to accompany the code (appearing soon). Until then, please contact me if you intend to use the code for a major project ([email protected]).

A close description of the vanilla DLM model implemented here can be found in Laine et al 2014, and this model/code was used for analyzing ozone data in Ball et al 2017 and Ball et al 2018. Please consider citing these papers along with the paper accompanying this code (in prep - appearing soon) when you use this code.

Contributions, reporting issues and feature requests

If you want to contribute (eg., extended models) to this package, please contact me at [email protected] (I welcome contributors). If you would like to request a feature to be included in the next update, or report an issue, please use the issues channel associated with this Git repository.

About

Dynamical linear modeling (DLM) regression code for analysis of atmospheric time-series data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.4%
  • Python 1.5%
  • TeX 0.1%