Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
gregzanotti committed Nov 13, 2021
0 parents commit fd8209b
Show file tree
Hide file tree
Showing 44 changed files with 3,115 additions and 0 deletions.
145 changes: 145 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Project-specific files
daily-returns.csv
*esiduals*.npy
*.npz
credentials.yaml
.vscode/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/
42 changes: 42 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Deep Learning Statistical Arbitrage
===================================

This repo contains the official code for our paper *Deep Learning Statistical Arbitrage*, available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3862004 and https://arxiv.org/abs/2106.04028.

## Quickstart

To test a trading policy model on a residual time series, use `run_train_test.py`.
This file exports a function, `run()`, which can be imported and used in e.g. a
grid search, or run from the command line. Command line usage will suit most users.
To run from the command line, use
```
python3 run_train_test.py -c configs/config_name_here.yaml
```
where `config_name_here.yaml` is a configuration file from the `configs` folder.
You can write your own configuration file to edit hyperparameters and other
settings for the trading test. See `run_train_test.py` for other command line options.

## Structure

This repo is organized as follows:
- `train_test.py` contains the code for training a trading policy model and simulating trading.
- `run_train_test.py` is a user interface to `train_test.py` which deals with configuration, logging, saving results, etc.
- `preprocess.py` contains functions for preprocessing residual time series data into a form usable by a trading policy model
- `data.py` contains miscellaneous functions for altering residual time series data
- `config` contains configuration files which define various tests of trading policy models on residual time series
- `data` should contain raw input data used to create residuals
- `factor_models` contains code for creating residuals from raw input data
- `residuals` stores residual time series data sets created by the code in `factor_models`
- `models` contains code for trading policy models
- `results` will contain the results of and plots for trading policy model tests conducted by `run_train_test.py`
- `logs` will contain logs for runs of models and factor models
- `tools` should contain miscellaneous code for interpreting and exploring results and saved models
- `utils.py` contains helpful functions used throughout

## Generating residuals

To create residuals, first ensure that input data is present in the `data` directory, then run `run_factor_model.py`, providing the name of a factor model in the `factor_models` directory, e.g.
```
python3 run_factor_model.py -m factor_model_name_here
```
Generated residuals for the factor model will be saved in the `residuals` folder.
51 changes: 51 additions & 0 deletions configs/cnntransformer-full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Major parameters
mode: "test" # can be 'test' or 'estimate'
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead
debug: False # set to True to turn on debug logging and file naming
# Model parameters
model_name: "CNNTransformer" # name of a class defined in models folder and initialized in model folder's __init__.py
model: { # contains parameter settings for __init__() function of class with name `model_name`
lookback: 30, # number of days of preprocessed residual time series to feed into model
dropout: 0.25,
filter_numbers: [1,8],
filter_size: 2,
attention_heads: 4,
hidden_units_factor: 2, # multiplicand of last item in `filter_numbers`; determines number of hidden units (e.g. 2*8 = 16)
# hidden_units: 16, # use either hidden_units or hidden_units_factor, but not both
normalization_conv: True, # normalize convolutions or not
use_transformer: True,
use_convolution: True,
}
# Data parameters
preprocess_func: "preprocess_cumsum" # name of a function defined in preprocess.py
use_residual_weights: True # use residual composition matrix to compute turnover, short proportion, etc.
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set
factor_models: { # number of factors per residual time series to test, for each factor model
"IPCA": [5],
"PCA": [5],
"FamaFrench": [5],
}
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable
# "noise_type" : "gaussian",
# "noise_mean" : 0.0,
# "noise_std_pct" : 2,
# "noise_only" : False,
# "per_residual" : True,
}
# Training parameters
num_epochs: 100
optimizer_name: "Adam" # see PyTorch docs for potential optimizers
optimizer_opts: { # see PyTorch docs for optimizer options
lr: 0.001
}
batch_size: 125
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set)
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set)
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk
length_training: 1000 # size of rolling training window in trading days
early_stopping: False # employ early stopping or not
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe'
# Market frictions parameters
market_frictions: False # enable or disable
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001
44 changes: 44 additions & 0 deletions configs/fourierffn-full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Major parameters
mode: "test" # can be 'test' or 'estimate'
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead
debug: False # set to True to turn on debug logging and file naming
# Model parameters
model_name: "FourierFFN" # name of a class defined in models folder and initialized in model folder's __init__.py
model: { # contains parameter settings for __init__() function of class with name `model_name`
lookback: 30, # number of days of preprocessed residual time series to feed into model
dropout: 0.25,
hidden_units: [30,16,8,4], # must start with 30
}
# Data parameters
preprocess_func: "preprocess_fourier" # name of a function defined in preprocess.py
use_residual_weights: False # use residual composition matrix to compute turnover, short proportion, etc.
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set
factor_models: { # number of factors per residual time series to test, for each factor model
"IPCA": [5],
"PCA": [5],
"FamaFrench": [5],
}
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable
# "noise_type" : "gaussian",
# "noise_mean" : 0.0,
# "noise_std_pct" : 2,
# "noise_only" : False,
# "per_residual" : True,
}
# Training parameters
num_epochs: 100
optimizer_name: "Adam" # see PyTorch docs for potential optimizers
optimizer_opts: { # see PyTorch docs for optimizer options
lr: 0.001
}
batch_size: 125
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set)
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set)
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk
length_training: 1000 # size of rolling training window in trading days
early_stopping: False # employ early stopping or not
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe'
# Market frictions parameters
market_frictions: False # enable or disable
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001
44 changes: 44 additions & 0 deletions configs/ouffn-full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Major parameters
mode: "test" # can be 'test' or 'estimate'
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead
debug: False # set to True to turn on debug logging and file naming
# Model parameters
model_name: "OUFFN" # name of a class defined in models folder and initialized in model folder's __init__.py
model: { # contains parameter settings for __init__() function of class with name `model_name`
lookback: 30, # number of days of preprocessed residual time series to feed into model
dropout: 0.25,
hidden_units: [4,4,4,4], # must start with 4, the OU signal length
}
# Data parameters
preprocess_func: "preprocess_ou" # name of a function defined in preprocess.py
use_residual_weights: False # use residual composition matrix to compute turnover, short proportion, etc.
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set
factor_models: { # number of factors per residual time series to test, for each factor model
"IPCA": [5],
"PCA": [5],
"FamaFrench": [5],
}
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable
# "noise_type" : "gaussian",
# "noise_mean" : 0.0,
# "noise_std_pct" : 2,
# "noise_only" : False,
# "per_residual" : True,
}
# Training parameters
num_epochs: 100
optimizer_name: "Adam" # see PyTorch docs for potential optimizers
optimizer_opts: { # see PyTorch docs for optimizer options
lr: 0.001
}
batch_size: 125
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set)
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set)
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk
length_training: 1000 # size of rolling training window in trading days
early_stopping: False # employ early stopping or not
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe'
# Market frictions parameters
market_frictions: False # enable or disable
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001
44 changes: 44 additions & 0 deletions configs/rawffn-full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Major parameters
mode: "test" # can be 'test' or 'estimate'
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead
debug: False # set to True to turn on debug logging and file naming
# Model parameters
model_name: "RawFFN" # name of a class defined in models folder and initialized in model folder's __init__.py
model: { # contains parameter settings for __init__() function of class with name `model_name`
lookback: 30, # number of days of preprocessed residual time series to feed into model
dropout: 0.25,
hidden_units: [30,16,8,4], # must start with lookback
}
# Data parameters
preprocess_func: "preprocess_cumsum" # name of a function defined in preprocess.py
use_residual_weights: False # use residual composition matrix to compute turnover, short proportion, etc.
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set
factor_models: { # number of factors per residual time series to test, for each factor model
"IPCA": [5],
"PCA": [5],
"FamaFrench": [5],
}
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable
# "noise_type" : "gaussian",
# "noise_mean" : 0.0,
# "noise_std_pct" : 2,
# "noise_only" : False,
# "per_residual" : True,
}
# Training parameters
num_epochs: 100
optimizer_name: "Adam" # see PyTorch docs for potential optimizers
optimizer_opts: { # see PyTorch docs for optimizer options
lr: 0.001
}
batch_size: 125
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set)
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set)
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk
length_training: 1000 # size of rolling training window in trading days
early_stopping: False # employ early stopping or not
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe'
# Market frictions parameters
market_frictions: False # enable or disable
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001
Loading

0 comments on commit fd8209b

Please sign in to comment.