Ninghui Feng*, Songning Lai*, Fobao Zhou, Zhenxiao Yin, Hang Zhao†
Time series forecasting has become an increasingly popular research area due to its critical applications in various real-world domains such as traffic management, weather prediction, and financial analysis. Despite significant advancements, existing models face notable challenges, including the necessity of manual hyperparameter tuning for different datasets, and difficulty in effectively distinguishing signal from redundant features in data characterized by strong seasonality. These issues hinder the generalization and practical application of time series forecasting models. To solve this issues, we propose an innovative time series forecasting model - TimeSieve designed to address these challenges. Our approach employs wavelet transforms to preprocess time series data, effectively capturing multi-scale features without the need for additional parameters or manual hyperparameter tuning. Additionally, we introduce the information bottleneck theory that filters out redundant features from both detail and approximation coefficients, retaining only the most predictive information. This combination reduces significantly improves the model's accuracy. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods on 70% of the datasets, achieving higher predictive accuracy and better generalization across diverse datasets. Our results validate the effectiveness of our approach in addressing the key challenges in time series forecasting, paving the way for more reliable and efficient predictive models in practical applications.
Existing models often require the introduction of additional parameters to capture multi-scale features, which increases computational complexity. Moreover, these models necessitate manual tuning of hyperparameters for different datasets, further complicating their application and limiting their generalization across diverse scenarios. More importantly, many models, including Autoformer, struggle to effectively distinguish between signal and redundant features when dealing with data characterized by strong seasonality. Redundant features not only interferes with the learning process, leading to increased errors, but also obscures the true patterns within the data, making the predictions unreliable. This issue is particularly pronounced when traditional autocorrelation mechanisms break down due to redundant features interference, significantly diminishing the practical value of these models.
- We introduce WDB and WRB that efficiently capture multi-scale features in time series data without requiring manual hyperparameter tuning or the introduction of additional parameters.
- We utilize an information-theoretic approach by incorporating an IFCB to filter out redundant features from the data. This ensures that only the most predictive features are retained, enhancing the model's accuracy.
- By integrating wavelet transforms for comprehensive feature extraction and the information bottleneck block for redundant features, our model achieves state-of-the-art performance on 70% of the datasets tested, demonstrating superior predictive accuracy and generalization across diverse scenarios.
In our study, we propose a novel time series forecasting model named TimeSieve. This model integrates the Information Filtering and Compression Block (IFCB) with wavelet transform technology to enhance the accuracy of time series predictions.
To effectively capture comprehensive feature information, we employ the Wavelet Decomposition Block (WDB). The WDB decomposes time series data into different frequency components, effectively extracting multi-scale information. Specifically, the wavelet transform can be represented by the following equations:
where the approximation coefficients
However, the extracted multi-scale information may contain redundant features, which can adversely affect the model's learning performance and predictive accuracy. To address this issue, we introduce the Information Filtering and Compression Block (IFCB) to filter the information. Additionally, we employ residual connections to ensure that valuable information is not lost during the filtering process. The IFCB optimizes the information flow, retaining critical information while filtering out irrelevant or redundant information. The equations for the Information Bottleneck module are as follows:
where
After filtering with the IFCB, we apply the Wavelet Reconstruction Block (WRB) to reconstruct the processed data back into the time domain. This step ensures that the features at different scales are fully utilized. Finally, we use a Multi-Layer Perceptron (MLP) as the prediction to make the final forecast on the processed time series data. The equation for the prediction step is as follows:
The TimeSieve model leverages the combination of wavelet transform and Information Bottleneck methods to effectively handle the multi-scale characteristics and noise present in time series data, thereby improving the model's predictive performance and robustness. This approach is motivated by the need to optimize information flow and enhance feature extraction, ensuring that our model can make accurate and reliable predictions across various applications.
The overall architecture of the TimeSieve model is illustrated in Figure 1. The TimeSieve model leverages the combination of wavelet transform and IFCB to effectively handle the multi-scale characteristics present in time series data, thereby improving the model's predictive performance. This approach is motivated by the need to optimize information flow and enhance feature extraction, ensuring that our model can make accurate and reliable predictions across various applications.
The above is an overall description of TimeSieve. We will provide a detailed introduction to WDB, WRB, and IFCB in the following sections.
To effectively capture comprehensive feature information, we employ wavelet transformation to decompose time series data, allowing us to analyze the data across multiple scales. Wavelet transformation allows us to break down the data into various frequency components, enabling us to analyze and extract features across multiple temporal scales. This multi-scale analysis is crucial for identifying patterns and trends that occur at different time resolutions.
WDB. For the input time series
where
In the case of db1 wavelet transform,
This decomposition allows the model to capture different characteristics of the data at various levels, thereby optimizing the understanding and prediction of time series dynamics, particularly in identifying local features within the series.
WRB. After the wavelet transformation, the detail and approximation coefficients are processed through IFCB to preserve the most predictive information. Subsequently, the inverse wavelet transform is applied to reconstruct the time series from the processed coefficients:
where
Motivated by the need to effectively filter out redundant features while retaining critical information, we introduce the Information Filtering and Compression Block (IFCB), inspired by the information bottleneck principle. In this section, we process the detail coefficients
In our model, both
We define the intermediate hidden layer random variable
This allows us to define the objective of IFCB, which is to maximize the mutual information between
where
The optimization is carried out using a deep neural network, which is designed to minimize the above objective by adjusting the network weights and biases through backpropagation and suitable regularization techniques.
Optimization is then performed using a deep neural network. Assuming
where
Parameter updates pose a challenge due to the stochastic nature of the gradient calculations. To address this, the reparameterization trick is introduced. The formula is given as:
where
Given that the study focuses on a regression task, we define the conditional probability
where
We now specify the model's loss function. This function is crucial for training the model to minimize prediction errors and optimize the distribution parameters effectively. The loss is composed of the original loss component and the IFCB loss, as shown below:
where
If you find this document useful for your research, please consider citing the following repository and paper:
@misc{TimeSieve_repo,
author = "{Ninghui Feng and Songning Lai and Zhenxiao Yin and Fobao Zhou and Hang Zhao}",
title = "{TimeSieve: Extracting Temporal Dynamics through Information Bottlenecks}",
howpublished = "{GitHub repository}",
note = "{URL: \url{https://github.com/xll0328/TimeSieve/}}",
year = {2024},
}
@misc{feng2024timesieve,
title={TimeSieve: Extracting Temporal Dynamics through Information Bottlenecks},
author={Ninghui Feng and Songning Lai and Fobao Zhou and Zhenxiao Yin and Hang Zhao},
year={2024},
eprint={2406.05036},
archivePrefix={arXiv},
primaryClass={cs.LG}
}