Merge pull request #434 from WenjieDu/dev

Add Reformer, add option `version` to control training log, and add benchpots as a dependency
WenjieDu · Jun 18, 2024 · b073c9e · b073c9e
2 parents e6d7c9f + ba116eb
commit b073c9e
Show file tree

Hide file tree

Showing 67 changed files with 2,021 additions and 27 deletions.
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
@@ -1,6 +1,10 @@
 blank_issues_enabled: true
 version: 2.1
 contact_links:
- - name: PyPOTS Community on Slack
+ - name: 🤗 PyPOTS Community on Slack
  url: https://join.slack.com/t/pypots-org/shared_invite/zt-1gq6ufwsi-p0OZdW~e9UW_IA4_f1OfxA
  about: General usage questions, community discussions, and the development team are here.
+
+ - name: 🇨🇳 PyPOTS微信社区 (Community on WeChat)
+ url: https://mp.weixin.qq.com/s/CGkirx7ODkngk6tGGANksw
+ about: 中文社区讨论群，欢迎加入！
diff --git a/.github/workflows/greetings.yml b/.github/workflows/greetings.yml
@@ -22,7 +22,10 @@ jobs:
  issue-message: |
  Hi there 👋,
 
- Thank you so much for your attention to PyPOTS! You can follow me on GitHub to receive the latest news of PyPOTS. If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community.
+ Thank you so much for your attention to PyPOTS! You can [follow me](https://github.com/WenjieDu) on GitHub 
+ to receive the latest news of PyPOTS. If you find PyPOTS helpful to your work, please star⭐️ this repository.
+ Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community.
+ It matters and is definitely a kind of contribution to the community.
 
  I have received your message and will respond ASAP. Thank you for your patience! 😃
 
@@ -31,7 +34,7 @@ jobs:
  pr-message: |
  Hi there 👋,
 
- We really really appreciate that you have taken the time to make this PR on PyPOTS!
+ We really appreciate that you have taken the time to make this PR on PyPOTS!
 
  If you are trying to fix a bug, please reference the issue number in the description or give your details about the bug.
  If you are implementing a feature request, please check with the maintainers that the feature will be accepted first.

diff --git a/README.md b/README.md
@@ -173,11 +173,19 @@ MCAR (missing completely at random), MAR (missing at random), and MNAR (missing
 PyGrinder supports all of them and additional functionalities related to missingness.
 With PyGrinder, you can introduce synthetic missing values into your datasets with a single line of code.
 
+<a href="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/WenjieDu/BenchPOTS">
+ <img src="https://pypots.com/figs/pypots_logos/BenchPOTS/logo_FFBG.svg" align="left" width="140" alt="BenchPOTS logo"/>
+</a>
+
+👈 To fairly evaluate the performance of PyPOTS algorithms, the benchmarking suite [BenchPOTS](https://github.com/WenjieDu/BenchPOTS) is created, 
+which provides standard and unified data-preprocessing pipelines to prepare datasets for measuring the performance of different 
+POTS algorithms on various tasks.
+
 <a href="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/WenjieDu/BrewPOTS">
- <img src="https://pypots.com/figs/pypots_logos/BrewPOTS/logo_FFBG.svg" align="left" width="140" alt="BrewPOTS logo"/>
+ <img src="https://pypots.com/figs/pypots_logos/BrewPOTS/logo_FFBG.svg" align="right" width="140" alt="BrewPOTS logo"/>
 </a>
 
-👈 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
+👉 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
 Considering the future workload, PyPOTS tutorials are released in a single repo,
 and you can find them in [BrewPOTS](https://github.com/WenjieDu/BrewPOTS).
 Take a look at it now, and learn how to brew your POTS datasets!

diff --git a/README_zh.md b/README_zh.md
@@ -156,11 +156,18 @@ TSDB让加载开源时序数据集变得超级简单！访问 [TSDB](https://git
 完全随机缺失（missing completely at random，简称为MCAR）、随机缺失（missing at random，简称为MAR）和非随机缺失（missing not at random，简称为MNAR ）。
 PyGrinder支持以上所有模式并提供与缺失相关的其他功能函数。通过PyGrinder，你可以仅仅通过一行代码就将模拟缺失引入你的数据集中。
 
+<a href="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/WenjieDu/BenchPOTS">
+ <img src="https://pypots.com/figs/pypots_logos/BenchPOTS/logo_FFBG.svg" align="left" width="140" alt="BenchPOTS logo"/>
+</a>
+
+👈 为了评估机器学习算法在POTS数据上的性能，我们创建了生态系统中的另一个仓库[BenchPOTS](https://github.com/WenjieDu/BenchPOTS),
+其提供了标准且统一的数据预处理管道来帮助你在多种任务上衡量不同POTS算法的性能。
+
 <a href="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/WenjieDu/BrewPOTS">
- <img src="https://pypots.com/figs/pypots_logos/BrewPOTS/logo_FFBG.svg" align="left" width="140" alt="BrewPOTS logo"/>
+ <img src="https://pypots.com/figs/pypots_logos/BrewPOTS/logo_FFBG.svg" align="right" width="140" alt="BrewPOTS logo"/>
 </a>
 
-👈 现在我们有了咖啡豆、磨豆机和咖啡壶，那么如何萃取一杯咖啡呢？冲泡教程是必不可少的！
+👉 现在我们有了咖啡豆、磨豆机和咖啡壶，那么如何萃取一杯咖啡呢？冲泡教程是必不可少的！
 考虑到未来的工作量，PyPOTS的相关教程将发布在一个独立的仓库[BrewPOTS](https://github.com/WenjieDu/BrewPOTS)中。
 点击访问查看教程，学习如何萃取你的POTS数据！
 

diff --git a/docs/benchpots.rst b/docs/benchpots.rst
@@ -0,0 +1,11 @@
+All APIs of BenchPOTS
+=======================
+
+benchpots.datasets
+------------------
+
+.. automodule:: benchpots.datasets
+ :members:
+ :undoc-members:
+ :show-inheritance:
+ :inherited-members:
diff --git a/docs/index.rst b/docs/index.rst
@@ -234,13 +234,23 @@ MCAR (missing completely at random), MAR (missing at random), and MNAR (missing
 PyGrinder supports all of them and additional functionalities related to missingness.
 With PyGrinder, you can introduce synthetic missing values into your datasets with a single line of code.
 
-.. image:: https://pypots.com/figs/pypots_logos/BrewPOTS/logo_FFBG.svg
+.. image:: https://pypots.com/figs/pypots_logos/BenchPOTS/logo_FFBG.svg
  :width: 150
  :alt: BrewPOTS logo
  :align: left
+ :target: https://github.com/WenjieDu/BenchPOTS
+
+👈 To fairly evaluate the performance of PyPOTS algorithms, the benchmarking suite [BenchPOTS](https://github.com/WenjieDu/BenchPOTS) is created,
+which provides standard and unified data-preprocessing pipelines to prepare datasets for measuring the performance of different
+POTS algorithms on various tasks.
+
+.. image:: https://pypots.com/figs/pypots_logos/BrewPOTS/logo_FFBG.svg
+ :width: 150
+ :alt: BrewPOTS logo
+ :align: right
  :target: https://github.com/WenjieDu/BrewPOTS
 
-👈 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
+👉 Now we have the beans, the grinder, and the pot, how to brew us a cup of coffee? Tutorials are necessary!
 Considering the future workload, PyPOTS tutorials is released in a single repo,
 and you can find them in `BrewPOTS <https://github.com/WenjieDu/BrewPOTS>`_.
 Take a look at it now, and learn how to brew your POTS datasets!
@@ -365,6 +375,7 @@ PyPOTS community is open, transparent, and surely friendly. Let's work together
  pypots
  tsdb
  pygrinder
+ benchpots
 
 .. toctree::
  :maxdepth: 2

diff --git a/pypots/base.py b/pypots/base.py
@@ -15,7 +15,7 @@
 from torch.utils.tensorboard import SummaryWriter
 
 from .utils.file import create_dir_if_not_exist
-from .utils.logging import logger
+from .utils.logging import logger, logger_creator
 
 
 class BaseModel(ABC):
@@ -43,6 +43,9 @@ class BaseModel(ABC):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
+
  Attributes
  ----------
  model : object, default = None
@@ -64,6 +67,7 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  saving_strategies = [None, "best", "better", "all"]
  assert (
@@ -73,6 +77,10 @@ def __init__(
  self.device = None # set up with _setup_device() below
  self.saving_path = None # set up with _setup_path() below
  self.model_saving_strategy = model_saving_strategy
+ self.verbose = verbose
+
+ if not self.verbose:
+ logger_creator.set_level("warning")
 
  self.model = None
  self.summary_writer = None
@@ -273,6 +281,8 @@ def save(
  """
  # split the saving dir and file name from the given path
  saving_dir, file_name = os.path.split(saving_path)
+ # if parent dir is not given, save in the current dir
+ saving_dir = "." if saving_dir == "" else saving_dir
  # add the suffix ".pypots" if not given
  if file_name.split(".")[-1] != "pypots":
  file_name += ".pypots"
@@ -442,6 +452,8 @@ class BaseNNModel(BaseModel):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
 
  Attributes
  ---------
@@ -475,11 +487,13 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
 
  if patience is None:
@@ -497,17 +511,20 @@ def __init__(
  self.num_workers = num_workers
 
  self.model = None
+ self.num_params = None
  self.optimizer = None
  self.best_model_dict = None
  self.best_loss = float("inf")
  self.best_epoch = -1
 
  def _print_model_size(self) -> None:
  """Print the number of trainable parameters in the initialized NN model."""
- num_params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
+ self.num_params = sum(
+ p.numel() for p in self.model.parameters() if p.requires_grad
+ )
  logger.info(
  f"{self.__class__.__name__} initialized with the given hyperparameters, "
- f"the number of trainable parameters: {num_params:,}"
+ f"the number of trainable parameters: {self.num_params:,}"
  )
 
  @abstractmethod

diff --git a/pypots/classification/base.py b/pypots/classification/base.py
@@ -51,6 +51,8 @@ class BaseClassifier(BaseModel):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
  """
 
  def __init__(
@@ -59,11 +61,13 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
  self.n_classes = n_classes
 
@@ -179,6 +183,8 @@ class BaseNNClassifier(BaseNNModel):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
 
  Notes
  -----
@@ -200,6 +206,7 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  batch_size,
@@ -209,6 +216,7 @@ def __init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
  self.n_classes = n_classes
 

diff --git a/pypots/classification/brits/model.py b/pypots/classification/brits/model.py
@@ -81,6 +81,8 @@ class BRITS(BaseNNClassifier):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
  """
 
  def __init__(
@@ -99,6 +101,7 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  n_classes,
@@ -109,6 +112,7 @@ def __init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
 
  self.n_steps = n_steps

diff --git a/pypots/classification/grud/model.py b/pypots/classification/grud/model.py
@@ -76,6 +76,8 @@ class GRUD(BaseNNClassifier):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
  """
 
  def __init__(
@@ -92,6 +94,7 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  n_classes,
@@ -102,6 +105,7 @@ def __init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
 
  self.n_steps = n_steps

diff --git a/pypots/classification/raindrop/model.py b/pypots/classification/raindrop/model.py
@@ -102,6 +102,8 @@ class Raindrop(BaseNNClassifier):
  better than in previous epochs.
  The "all" strategy will save every model after each epoch training.
 
+ verbose :
+ Whether to print out the training logs during the training process.
  """
 
  def __init__(
@@ -126,6 +128,7 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: str = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  n_classes,
@@ -136,6 +139,7 @@ def __init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
 
  self.n_features = n_features

diff --git a/pypots/classification/template/model.py b/pypots/classification/template/model.py
@@ -40,6 +40,7 @@ def __init__(
  device: Optional[Union[str, torch.device, list]] = None,
  saving_path: Optional[str] = None,
  model_saving_strategy: Optional[str] = "best",
+ verbose: bool = True,
  ):
  super().__init__(
  n_classes,
@@ -50,6 +51,7 @@ def __init__(
  device,
  saving_path,
  model_saving_strategy,
+ verbose,
  )
  # set up the hyper-parameters
  # TODO: set up your model's hyper-parameters here

diff --git a/pypots/cli/tuning.py b/pypots/cli/tuning.py
@@ -210,7 +210,7 @@ def run(self):
  if self._model not in NN_MODELS:
  logger.info(
  f"The specified model {self._model} is not in PyPOTS. Available models are {NN_MODELS.keys()}. "
- f"Trying to fetch it from the given model package {self._model_package_path}."
+ f"Trying to fetch it from the given model package {self._model_package_path}"
  )
  assert self._model_package_path is not None, (
  f"The given model {self._model} is not in PyPOTS. "
@@ -220,7 +220,7 @@ def run(self):
  )
  model_package = load_package_from_path(self._model_package_path)
  assert self._model in model_package.__all__, (
- f"{self._model} is not in the given model package {self._model_package_path}."
+ f"{self._model} is not in the given model package {self._model_package_path}"
  f"Please ensure that the model class is in the __all__ list of the model package."
  )
  model_class = getattr(model_package, self._model)