This repo contains the code for reproducing the results of the following papers (done as part of my Master's thesis at St Andrews):
- Benchmarking Continual Learning in Sensor-based Human Activity Recognition: an Empirical Analysis [Accepted in the Information Sciences (April 2021)]
- Continual Learning in Human Activity Recognition (HAR): An Emperical Analysis of Regularization [ICML workshop on Continual Learning (July 2020)]
A sub-total of 11 recent continual learning techniques have been implemented on a component-wise basis:
- Maintaining Discrimination and Fairness in Class Incremental Learning (WA-MDF) [Paper]
- Adjusting Decision Boundary for Class Imbalanced Learning (WA-ADB) [Paper]
- Large Scale Incremental Learning (BiC) [Paper]
- Learning a Unified Classifier Incrementally via Rebalancing (LUCIR) [Paper]
- Incremental Learning in Online Scenario (ILOS) [Paper]
- Gradient Episodic Memory for Continual Learning (GEM) [Paper]
- Efficient Lifelong Learning with A-GEM [Paper]
- Elastic Weight Consolidation (EWC) [Paper]
- Rotated Elastic Weight Consolidation (R-EWC) [Paper]
- Learning without Forgetting (LwF) [Paper]
- Memory Aware Synapses (MAS) [Paper]
Additionally, the following six exemplar-selection techniques are available (for memory-rehearsal):
- Herding from ICaRL [Paper]
- Frank-Wolfe Sparse Regression (FWSR) [Paper]
- K-means sampling
- DPP sampling
- Boundary-based sampling [Paper]
- Sensitivity-based sampling [Paper]
For training, please execute the runner.sh
script that creates all the directories required for logging the outputs. One can add further similar commands for running further experiments.
For instance, training on ARUBA dataset with FWSR-styled exemplar selection:
>>> python runner.py --dataset 'aruba' --total_classes 11 --base_classes 2 --new_classes 2 --epochs 160 --method 'kd_kldiv_wa1' --exemplar 'fwsr' # e.g. for FWSR-styled exemplar selection
The existing forgetting measure metric [1] suffers from self-relativeness, i.e., the forgetting score will remain low throughout the training if the model did not learn much information about the class at the beginning. Class-imbalance scenarios (as in our case) further amplify its ramifications [2]. Code for our correction to the forgetting score can be found here.
The experiments were performed on 8 publicly available HAR datasets. These can downloaded from the drive link in datasets/
.
The experiments for each dataset and for each train set / exemplar size were performed on 30 random sequences of tasks. The logs in output_reports/[dataname]
(created after executing the bash script) contain the performances of each individual task sequence as the incremental learning progresses. The final accuracy is then reported as the average over the 30 runs (see instructions below for evaluation).
For evaluation, please uncomment the lines per the instructions in runner.py
. This can be used to measure forgetting scores [2], base-new-old accuracies, and average report by holdout sizes.
The component-wise implementation of techniques nevertheless helps in playing with two or more techniques. This can be done by tweaking the --method
argument. The table below details some of these combinations:
Technique | Argument for --method |
---|---|
Knowledge distillation with margin ranking loss (KD_MR) | kd_kldiv_mr |
KD_MR with WA-MDF | kd_kldiv_mr_wa1 |
KD_MR with WA-ADB | kd_kldiv_mr_wa2 |
KD_MR with less forget constraint loss (KD_LFC_MR) | kd_kldiv_lfc_mr |
KD_LFC_MR with WA-MDF | kd_kldiv_lfc_mr_wa1 |
KD_LFC_MR with WA-ADB | kd_kldiv_lfc_mr_wa2 |
Cosine normalisation with knowledge distillation | cn_kd_kldiv |
Furthermore, the logits replacement tweak of ILOS and weight initialisation from LUCIR can be used with either of the above methods by simply setting the following arguments:
Technique | Argument |
---|---|
ILOS (with either of above) | --replace_new_logits = True |
LUCIR-styled weight initialisation (with either of above) | --wt_init = True |
Please feel free to play around with these. We would be interested in knowing if the combinations deliver better results for you!
-
All the experiments in our papers used number of base classes and incremental classes as 2. For replicating this, set
--base_classes = 2
and--new_classes = 2
. -
For offline learning (i.e., without incremental training), set
--base_classes
to the total number of classes in the dataset and--new_classes = 0
. -
For experiments with permuted datasets, set
--base_classes = --new_classes
where--base_classes
= the total number of classes in the dataset.
The implementations have been verified through runs on Split-MNIST and Permumted-MNIST - also available for download in datasets/
.
Special thanks to sairin1202's implementation of BiC and Electronic Tomato's implementation of GEM/AGEM/EWC/MAS.
[1] Chaudhry, A., Dokania, P.K., Ajanthan, T., & Torr, P.H. (2018). Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence. ECCV.
[2] Kim, C. D., Jeong, J., & Kim, G. (2020). Imbalanced continual learning with partitioning reservoir sampling. ECCV.
If you found this repo useful in your work, please feel free to cite us:
@article{jha2021continual,
title={Continual Learning in Sensor-based Human Activity Recognition: an Empirical Benchmark Analysis},
author={Jha, Saurav and Schiemer, Martin and Zambonelli, Franco and Ye, Juan},
journal={Information Sciences},
year={2021},
publisher={Elsevier}
}
@article{jha2020continual,
title={Continual learning in human activity recognition: an empirical analysis of regularization},
author={Jha, Saurav and Schiemer, Martin and Ye, Juan},
journal={Proceedings of Machine Learning Research},
year={2020}
}