.. centered:: A Python Toolbox for Data Mining on Partially-Observed Time Series
⦿ Motivation: Due to all kinds of reasons like failure of collection sensors, communication error, and unexpected malfunction, missing values are common to see in time series from the real-world environment. This makes partially-observed time series (POTS) a pervasive problem in open-world modeling and prevents advanced data analysis. Although this problem is important, the area of data mining on POTS still lacks a dedicated toolkit. PyPOTS is created to fill in this blank.
⦿ Mission: PyPOTS is born to become a handy toolbox that is going to make data mining on POTS easy rather than tedious, to help engineers and researchers focus more on the core problems in their hands rather than on how to deal with the missing parts in their data. PyPOTS will keep integrating classical and the latest state-of-the-art data mining algorithms for partially-observed multivariate time series. For sure, besides various algorithms, PyPOTS is going to have unified APIs together with detailed documentation and interactive examples across algorithms as tutorials.
To make various open-source time-series datasets readily available to our users, PyPOTS gets supported by project TSDB (Time-Series Data Base), a toolbox making loading time-series datasets super easy!
Visit TSDB right now to know more about this handy tool 🛠! It now supports a total of 119 open-source datasets.
The rest of this readme file is organized as follows: ❖ Installation, ❖ Usage, ❖ Available Algorithms, ❖ Citing PyPOTS, ❖ Community, ❖ Contribution.
PyPOTS is available on both PyPI and Anaconda ❗️
Refer to the page Installation to see how to install PyPOTS.
![BrewedPOTS logo](https://raw.githubusercontent.com/WenjieDu/BrewedPOTS/main/figs/BrewedPOTS_logo.jpg)
PyPOTS tutorials have been released. Considering the future workload, I separate the tutorials into a single repo, and you can find them in BrewedPOTS. Take a look at it now, and brew your POTS dataset into a cup of coffee! 🤓
If you have further questions, please refer to PyPOTS documentation 📑 docs.pypots.com. Besides, you can also raise an issue or ask in our community.
PyPOTS supports imputation, classification, clustering, and forecasting tasks on multivariate time series with missing values. The currently available algorithms of four tasks are cataloged in the following table with four partitions. The paper references are all listed at the bottom of this readme file. Please refer to them if you want more details.
Task | Type | Algorithm | Year | Reference |
---|---|---|---|---|
Imputation | Neural Network | SAITS (Self-Attention-based Imputation for Time Series) | 2022 | :cite:`du2023SAITS` |
Imputation | Neural Network | Transformer | 2017 | :cite:`vaswani2017Transformer`, :cite:`du2023SAITS` |
Imputation, Classification | Neural Network | BRITS (Bidirectional Recurrent Imputation for Time Series) | 2018 | :cite:`cao2018BRITS` |
Imputation | Naive | LOCF (Last Observation Carried Forward) | / | / |
Classification | Neural Network | GRU-D | 2018 | :cite:`che2018GRUD` |
Classification | Neural Network | Raindrop | 2022 | :cite:`zhang2022Raindrop` |
Clustering | Neural Network | CRLI (Clustering Representation Learning on Incomplete time-series data) | 2021 | :cite:`ma2021CRLI` |
Clustering | Neural Network | VaDER (Variational Deep Embedding with Recurrence) | 2019 | :cite:`dejong2019VaDER` |
Forecasting | Probabilistic | BTTF (Bayesian Temporal Tensor Factorization) | 2021 | :cite:`chen2021BTMF` |
We are pursuing to publish a short paper introducing PyPOTS in prestigious academic venues, e.g. JMLR (track for Machine Learning Open Source Software. Before that, PyPOTS is using its DOI from Zenodo for reference. If you use PyPOTS in your research, please cite it as below and 🌟star this repository to make others notice this work. 🤗
@misc{du2022PyPOTS,
author = {Wenjie Du},
title = {{PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series}},
howpublished = {\url{https://github.com/wenjiedu/pypots}},
year = {2022},
doi = {10.5281/zenodo.6823221},
}
or
Wenjie Du. (2022). PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series. Zenodo. https://doi.org/10.5281/zenodo.6823221
We care about the feedback from our users, so we're building PyPOTS community on
- Slack. General discussion, Q&A, and our development team are here;
- LinkedIn. Official announcements and news are here;
- WeChat (微信公众号). We also run a group chat on WeChat, and you can get the QR code from the official account after following it;
If you have any suggestions or want to contribute ideas or share time-series related papers, join us and tell. PyPOTS community is open, transparent, and surely friendly. Let's work together to build and improve PyPOTS 💪!
You're very welcome to contribute to this exciting project!
By committing your code, you'll
- make your well-established model out-of-the-box for PyPOTS users to run. Take a look at our inclusion criteria;
- be listed as one of PyPOTS contributors:
- get mentioned in our release notes;
You can also contribute to PyPOTS by simply staring🌟 this repo to help more people notice it. Your star is your recognition to PyPOTS, and it matters!
The lists of PyPOTS stargazers and forkers are shown below, and we're so proud to have more and more awesome users, as well as more bright ✨stars:
PyPOTS is currently under developing. If you like it and look forward to its growth, please give PyPOTS a star and watch it to keep you posted on its progress and to let me know that its development is meaningful. If you have any additional questions or have interests in collaboration, please take a look at my GitHub profile and feel free to contact me 🤝.
Thank you all for your attention! 😃
.. toctree:: :maxdepth: 2 :hidden: :caption: Getting Started install examples
.. toctree:: :maxdepth: 2 :hidden: :caption: Code Documentation pypots
.. toctree:: :maxdepth: 2 :hidden: :caption: Additional Information faq about_us references