forked from gregzanotti/dlsa-public
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit fd8209b
Showing
44 changed files
with
3,115 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,145 @@ | ||
# Project-specific files | ||
daily-returns.csv | ||
*esiduals*.npy | ||
*.npz | ||
credentials.yaml | ||
.vscode/ | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
Deep Learning Statistical Arbitrage | ||
=================================== | ||
|
||
This repo contains the official code for our paper *Deep Learning Statistical Arbitrage*, available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3862004 and https://arxiv.org/abs/2106.04028. | ||
|
||
## Quickstart | ||
|
||
To test a trading policy model on a residual time series, use `run_train_test.py`. | ||
This file exports a function, `run()`, which can be imported and used in e.g. a | ||
grid search, or run from the command line. Command line usage will suit most users. | ||
To run from the command line, use | ||
``` | ||
python3 run_train_test.py -c configs/config_name_here.yaml | ||
``` | ||
where `config_name_here.yaml` is a configuration file from the `configs` folder. | ||
You can write your own configuration file to edit hyperparameters and other | ||
settings for the trading test. See `run_train_test.py` for other command line options. | ||
|
||
## Structure | ||
|
||
This repo is organized as follows: | ||
- `train_test.py` contains the code for training a trading policy model and simulating trading. | ||
- `run_train_test.py` is a user interface to `train_test.py` which deals with configuration, logging, saving results, etc. | ||
- `preprocess.py` contains functions for preprocessing residual time series data into a form usable by a trading policy model | ||
- `data.py` contains miscellaneous functions for altering residual time series data | ||
- `config` contains configuration files which define various tests of trading policy models on residual time series | ||
- `data` should contain raw input data used to create residuals | ||
- `factor_models` contains code for creating residuals from raw input data | ||
- `residuals` stores residual time series data sets created by the code in `factor_models` | ||
- `models` contains code for trading policy models | ||
- `results` will contain the results of and plots for trading policy model tests conducted by `run_train_test.py` | ||
- `logs` will contain logs for runs of models and factor models | ||
- `tools` should contain miscellaneous code for interpreting and exploring results and saved models | ||
- `utils.py` contains helpful functions used throughout | ||
|
||
## Generating residuals | ||
|
||
To create residuals, first ensure that input data is present in the `data` directory, then run `run_factor_model.py`, providing the name of a factor model in the `factor_models` directory, e.g. | ||
``` | ||
python3 run_factor_model.py -m factor_model_name_here | ||
``` | ||
Generated residuals for the factor model will be saved in the `residuals` folder. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Major parameters | ||
mode: "test" # can be 'test' or 'estimate' | ||
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead | ||
debug: False # set to True to turn on debug logging and file naming | ||
# Model parameters | ||
model_name: "CNNTransformer" # name of a class defined in models folder and initialized in model folder's __init__.py | ||
model: { # contains parameter settings for __init__() function of class with name `model_name` | ||
lookback: 30, # number of days of preprocessed residual time series to feed into model | ||
dropout: 0.25, | ||
filter_numbers: [1,8], | ||
filter_size: 2, | ||
attention_heads: 4, | ||
hidden_units_factor: 2, # multiplicand of last item in `filter_numbers`; determines number of hidden units (e.g. 2*8 = 16) | ||
# hidden_units: 16, # use either hidden_units or hidden_units_factor, but not both | ||
normalization_conv: True, # normalize convolutions or not | ||
use_transformer: True, | ||
use_convolution: True, | ||
} | ||
# Data parameters | ||
preprocess_func: "preprocess_cumsum" # name of a function defined in preprocess.py | ||
use_residual_weights: True # use residual composition matrix to compute turnover, short proportion, etc. | ||
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set | ||
factor_models: { # number of factors per residual time series to test, for each factor model | ||
"IPCA": [5], | ||
"PCA": [5], | ||
"FamaFrench": [5], | ||
} | ||
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable | ||
# "noise_type" : "gaussian", | ||
# "noise_mean" : 0.0, | ||
# "noise_std_pct" : 2, | ||
# "noise_only" : False, | ||
# "per_residual" : True, | ||
} | ||
# Training parameters | ||
num_epochs: 100 | ||
optimizer_name: "Adam" # see PyTorch docs for potential optimizers | ||
optimizer_opts: { # see PyTorch docs for optimizer options | ||
lr: 0.001 | ||
} | ||
batch_size: 125 | ||
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set) | ||
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set) | ||
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk | ||
length_training: 1000 # size of rolling training window in trading days | ||
early_stopping: False # employ early stopping or not | ||
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe' | ||
# Market frictions parameters | ||
market_frictions: False # enable or disable | ||
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005 | ||
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Major parameters | ||
mode: "test" # can be 'test' or 'estimate' | ||
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead | ||
debug: False # set to True to turn on debug logging and file naming | ||
# Model parameters | ||
model_name: "FourierFFN" # name of a class defined in models folder and initialized in model folder's __init__.py | ||
model: { # contains parameter settings for __init__() function of class with name `model_name` | ||
lookback: 30, # number of days of preprocessed residual time series to feed into model | ||
dropout: 0.25, | ||
hidden_units: [30,16,8,4], # must start with 30 | ||
} | ||
# Data parameters | ||
preprocess_func: "preprocess_fourier" # name of a function defined in preprocess.py | ||
use_residual_weights: False # use residual composition matrix to compute turnover, short proportion, etc. | ||
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set | ||
factor_models: { # number of factors per residual time series to test, for each factor model | ||
"IPCA": [5], | ||
"PCA": [5], | ||
"FamaFrench": [5], | ||
} | ||
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable | ||
# "noise_type" : "gaussian", | ||
# "noise_mean" : 0.0, | ||
# "noise_std_pct" : 2, | ||
# "noise_only" : False, | ||
# "per_residual" : True, | ||
} | ||
# Training parameters | ||
num_epochs: 100 | ||
optimizer_name: "Adam" # see PyTorch docs for potential optimizers | ||
optimizer_opts: { # see PyTorch docs for optimizer options | ||
lr: 0.001 | ||
} | ||
batch_size: 125 | ||
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set) | ||
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set) | ||
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk | ||
length_training: 1000 # size of rolling training window in trading days | ||
early_stopping: False # employ early stopping or not | ||
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe' | ||
# Market frictions parameters | ||
market_frictions: False # enable or disable | ||
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005 | ||
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Major parameters | ||
mode: "test" # can be 'test' or 'estimate' | ||
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead | ||
debug: False # set to True to turn on debug logging and file naming | ||
# Model parameters | ||
model_name: "OUFFN" # name of a class defined in models folder and initialized in model folder's __init__.py | ||
model: { # contains parameter settings for __init__() function of class with name `model_name` | ||
lookback: 30, # number of days of preprocessed residual time series to feed into model | ||
dropout: 0.25, | ||
hidden_units: [4,4,4,4], # must start with 4, the OU signal length | ||
} | ||
# Data parameters | ||
preprocess_func: "preprocess_ou" # name of a function defined in preprocess.py | ||
use_residual_weights: False # use residual composition matrix to compute turnover, short proportion, etc. | ||
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set | ||
factor_models: { # number of factors per residual time series to test, for each factor model | ||
"IPCA": [5], | ||
"PCA": [5], | ||
"FamaFrench": [5], | ||
} | ||
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable | ||
# "noise_type" : "gaussian", | ||
# "noise_mean" : 0.0, | ||
# "noise_std_pct" : 2, | ||
# "noise_only" : False, | ||
# "per_residual" : True, | ||
} | ||
# Training parameters | ||
num_epochs: 100 | ||
optimizer_name: "Adam" # see PyTorch docs for potential optimizers | ||
optimizer_opts: { # see PyTorch docs for optimizer options | ||
lr: 0.001 | ||
} | ||
batch_size: 125 | ||
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set) | ||
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set) | ||
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk | ||
length_training: 1000 # size of rolling training window in trading days | ||
early_stopping: False # employ early stopping or not | ||
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe' | ||
# Market frictions parameters | ||
market_frictions: False # enable or disable | ||
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005 | ||
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Major parameters | ||
mode: "test" # can be 'test' or 'estimate' | ||
results_tag: "" # optional; try not to use underscores in this tag, use dashes instead | ||
debug: False # set to True to turn on debug logging and file naming | ||
# Model parameters | ||
model_name: "RawFFN" # name of a class defined in models folder and initialized in model folder's __init__.py | ||
model: { # contains parameter settings for __init__() function of class with name `model_name` | ||
lookback: 30, # number of days of preprocessed residual time series to feed into model | ||
dropout: 0.25, | ||
hidden_units: [30,16,8,4], # must start with lookback | ||
} | ||
# Data parameters | ||
preprocess_func: "preprocess_cumsum" # name of a function defined in preprocess.py | ||
use_residual_weights: False # use residual composition matrix to compute turnover, short proportion, etc. | ||
cap_proportion: 0.01 # defines asset universe: 0.01 corresponds to a residual data set | ||
factor_models: { # number of factors per residual time series to test, for each factor model | ||
"IPCA": [5], | ||
"PCA": [5], | ||
"FamaFrench": [5], | ||
} | ||
perturbation: { # perturbation of residual time series by noise is optional, leave empty or comment out entirely to disable | ||
# "noise_type" : "gaussian", | ||
# "noise_mean" : 0.0, | ||
# "noise_std_pct" : 2, | ||
# "noise_only" : False, | ||
# "per_residual" : True, | ||
} | ||
# Training parameters | ||
num_epochs: 100 | ||
optimizer_name: "Adam" # see PyTorch docs for potential optimizers | ||
optimizer_opts: { # see PyTorch docs for optimizer options | ||
lr: 0.001 | ||
} | ||
batch_size: 125 | ||
retrain_freq: 125 # if mode=='estimate', this is the number of obs used to form a test set (chronologically after the training set) | ||
rolling_retrain: True # set to False for no rolling retraining (i.e. train once, test for all data past training set) | ||
force_retrain: True # force the model to be trained, even if existing weights for the model are saved on disk | ||
length_training: 1000 # size of rolling training window in trading days | ||
early_stopping: False # employ early stopping or not | ||
objective: "sharpe" # objective function: 'sharpe' or 'meanvar' or 'sqrtMeanSharpe' | ||
# Market frictions parameters | ||
market_frictions: False # enable or disable | ||
trans_cost: 0 # cost in bps per txn side per equity, e.g. 0.0005 | ||
hold_cost: 0 # cost in bps for short positions per equity per day, e.g. 0.0001 |
Oops, something went wrong.