Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocorrelation #145

Closed
firmai opened this issue Jun 14, 2023 · 4 comments
Closed

Autocorrelation #145

firmai opened this issue Jun 14, 2023 · 4 comments
Labels
question Further information is requested stale

Comments

@firmai
Copy link

firmai commented Jun 14, 2023

Issue description

Which parameters in SAITS helps to improve autocorrelation modelling? Thanks :)

@firmai firmai added the question Further information is requested label Jun 14, 2023
@WenjieDu
Copy link
Owner

Hi there 👋,

Thank you so much for your attention to PyPOTS! If you find PyPOTS helpful to your work, please star⭐️ this repository. Your star is your recognition, which can help more people notice PyPOTS and grow PyPOTS community. It matters and is definitely a kind of contribution to the community.

I have received your message and will respond ASAP. Thank you for your patience! 😃

Best,
Wenjie

@firmai
Copy link
Author

firmai commented Jun 14, 2023

The reason I ask is because my dataframe look like this (2D) not panel. (12218, 12)

But I want to make sure that the autocorrelation strucutre gets modelled, currently, I am using:

from sklearn.preprocessing import StandardScaler
from pypots.data import mcar, masked_fill
from pypots.imputation import SAITS
from pypots.utils.metrics import cal_mae
import numpy as np
import pandas as pd

def impute_missing_data(df, holdout_fraction=0.1, n_layers=2, d_model=32, d_inner=16, n_heads=1, d_k=4, d_v=4, dropout=0.1, epochs=14):
    # Preprocessing
    X = df.values  # extract feature columns as a numpy array
    scaler = StandardScaler()
    X = scaler.fit_transform(X)  # standardize features
    num_samples = X.shape[0]
    num_features = X.shape[1]
    X = X.reshape(num_samples, 1, num_features)  # reshape array to 3D

    X_intact, X, missing_mask, indicating_mask = mcar(X, holdout_fraction)  # hold out a fraction of observed values as ground truth
    X = masked_fill(X, 1 - missing_mask, np.nan)

    # Model training
    n_steps = 1  # only one time step per sample in this case
    saits = SAITS(n_steps=n_steps, n_features=num_features, n_layers=n_layers, d_model=d_model, d_inner=d_inner, n_heads=n_heads, d_k=d_k, d_v=d_v, dropout=dropout, epochs=epochs)
    dataset = {"X": X}
    saits.fit(dataset)  # train the model
    imputation = saits.impute(dataset)  # impute the originally-missing values and artificially-missing values

    # Inverse Transform Imputed Data
    imputation_rescaled = scaler.inverse_transform(imputation.reshape(-1, num_features))

    # Create a DataFrame from the imputed and rescaled data, with the original column names
    df_imputed = pd.DataFrame(imputation_rescaled, columns=df.columns, index=df.index)

    df_imputed = df.fillna(df_imputed)
    return df_imputed

@WenjieDu
Copy link
Owner

Hi Derek, thanks for raising this discussion!

For the input to all models in PyPOTS, its shape should be 3D rather than 2D. Therefore, you should generate your 3D dataset from your pandas dataframe before training a model.

Regarding your question about autocorrelation modeling in SAITS, there's no specific hyper-parameter in SAITS for enhancing autocorrelation. We do have a boolean hyper-parameter diagonal_attention_mask to control whether to apply the diagonal mask to the self-attention map for improving SAITS' ability to capture correlations between time steps.

BTW, your dataframe has a length of 12218. I'd suggest in your data processing, you make each sample from your dataframe have like 100 steps. Because a too-long sample length may cause out-of-memory and slow-processing problems when training a self-attention model on your machine because the attention map is too large.

@github-actions
Copy link

This issue had no activity for 30 days. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?

@github-actions github-actions bot added the stale label Jul 16, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

2 participants