Autocorrelation #145

firmai · 2023-06-14T16:46:46Z

Issue description

Which parameters in SAITS helps to improve autocorrelation modelling? Thanks :)

WenjieDu · 2023-06-14T16:47:19Z

Hi there 👋,

Thank you so much for your attention to PyPOTS! If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community.

I have received your message and will respond ASAP. Thank you for your patience! 😃

Best,
Wenjie

firmai · 2023-06-14T17:03:29Z

The reason I ask is because my dataframe look like this (2D) not panel. (12218, 12)

But I want to make sure that the autocorrelation strucutre gets modelled, currently, I am using:

from sklearn.preprocessing import StandardScaler
from pypots.data import mcar, masked_fill
from pypots.imputation import SAITS
from pypots.utils.metrics import cal_mae
import numpy as np
import pandas as pd

def impute_missing_data(df, holdout_fraction=0.1, n_layers=2, d_model=32, d_inner=16, n_heads=1, d_k=4, d_v=4, dropout=0.1, epochs=14):
    # Preprocessing
    X = df.values  # extract feature columns as a numpy array
    scaler = StandardScaler()
    X = scaler.fit_transform(X)  # standardize features
    num_samples = X.shape[0]
    num_features = X.shape[1]
    X = X.reshape(num_samples, 1, num_features)  # reshape array to 3D

    X_intact, X, missing_mask, indicating_mask = mcar(X, holdout_fraction)  # hold out a fraction of observed values as ground truth
    X = masked_fill(X, 1 - missing_mask, np.nan)

    # Model training
    n_steps = 1  # only one time step per sample in this case
    saits = SAITS(n_steps=n_steps, n_features=num_features, n_layers=n_layers, d_model=d_model, d_inner=d_inner, n_heads=n_heads, d_k=d_k, d_v=d_v, dropout=dropout, epochs=epochs)
    dataset = {"X": X}
    saits.fit(dataset)  # train the model
    imputation = saits.impute(dataset)  # impute the originally-missing values and artificially-missing values

    # Inverse Transform Imputed Data
    imputation_rescaled = scaler.inverse_transform(imputation.reshape(-1, num_features))

    # Create a DataFrame from the imputed and rescaled data, with the original column names
    df_imputed = pd.DataFrame(imputation_rescaled, columns=df.columns, index=df.index)

    df_imputed = df.fillna(df_imputed)
    return df_imputed

WenjieDu · 2023-06-15T09:10:25Z

Hi Derek, thanks for raising this discussion!

For the input to all models in PyPOTS, its shape should be 3D rather than 2D. Therefore, you should generate your 3D dataset from your pandas dataframe before training a model.

Regarding your question about autocorrelation modeling in SAITS, there's no specific hyper-parameter in SAITS for enhancing autocorrelation. We do have a boolean hyper-parameter diagonal_attention_mask to control whether to apply the diagonal mask to the self-attention map for improving SAITS' ability to capture correlations between time steps.

BTW, your dataframe has a length of 12218. I'd suggest in your data processing, you make each sample from your dataframe have like 100 steps. Because a too-long sample length may cause out-of-memory and slow-processing problems when training a self-attention model on your machine because the attention map is too large.

github-actions · 2023-07-16T01:36:53Z

This issue had no activity for 30 days. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

firmai added the question Further information is requested label Jun 14, 2023

github-actions bot added the stale label Jul 16, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocorrelation #145

Autocorrelation #145

firmai commented Jun 14, 2023

WenjieDu commented Jun 14, 2023

firmai commented Jun 14, 2023

WenjieDu commented Jun 15, 2023

github-actions bot commented Jul 16, 2023

Autocorrelation #145

Autocorrelation #145

Comments

firmai commented Jun 14, 2023

Issue description

WenjieDu commented Jun 14, 2023

firmai commented Jun 14, 2023

WenjieDu commented Jun 15, 2023

github-actions bot commented Jul 16, 2023