Make the framework using a consistent strategy when lazy-loading val_set #270

WenjieDu · 2023-12-18T00:39:39Z

Lines 253 to 277 in d457629

 if val_set is not None: 

 if isinstance(val_set, str): 

 with h5py.File(val_set, "r") as hf: 

 # Here we read the whole validation set from the file to mask a portion for validation. 

 # In PyPOTS, using a file usually because the data is too big. However, the validation set is 

 # generally shouldn't be too large. For example, we have 1 billion samples for model training. 

 # We won't take 20% of them as the validation set because we want as much as possible data for the 

 # training stage to enhance the model's generalization ability. Therefore, 100,000 representative 

 # samples will be enough to validate the model. 

 val_set = { 

 "X": hf["X"][:], 

 "X_intact": hf["X_intact"][:], 

 "indicating_mask": hf["indicating_mask"][:], 

 } 

 # check if X_intact contains missing values 

 if np.isnan(val_set["X_intact"]).any(): 

 val_set["X_intact"] = np.nan_to_num(val_set["X_intact"], nan=0) 

 logger.warning( 

 "X_intact shouldn't contain missing data but has NaN values. " 

 "PyPOTS has imputed them with zeros by default to start the training for now. " 

 "Please double-check your data if you have concerns over this operation." 

 ) 

 val_set = BaseDataset(val_set, return_labels=False, file_type=file_type)

E.g. in imputation models, when val_set is set as an h5 file path to enable lazy loading, the framework still loads all data from the file. Although people usually don't have a large validation set, this may increase memory pressure if so. And we also expect the framework to have a consistent behavior with train_set. Therefore we need to make PyPOTS to apply the same strategy on val_set.

The text was updated successfully, but these errors were encountered:

WenjieDu added the enhancement New feature or request label Dec 18, 2023

WenjieDu added this to the v0.3 milestone Dec 18, 2023

WenjieDu self-assigned this Dec 18, 2023

This was referenced Dec 18, 2023

Simplifying the structure of val_set, and using a consistent strategy when lazy-loading val_set #272

Merged

Simplifying val_set, renaming X_intact, and adding unit tests for the visual package #275

Merged

WenjieDu closed this as completed Dec 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the framework using a consistent strategy when lazy-loading val_set #270

Make the framework using a consistent strategy when lazy-loading val_set #270

WenjieDu commented Dec 18, 2023

Make the framework using a consistent strategy when lazy-loading val_set #270

Make the framework using a consistent strategy when lazy-loading val_set #270

Comments

WenjieDu commented Dec 18, 2023