Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuously increasing RAM with Pre-training #77

Open
abhisheksgumadi opened this issue Jul 5, 2022 · 12 comments
Open

Continuously increasing RAM with Pre-training #77

abhisheksgumadi opened this issue Jul 5, 2022 · 12 comments

Comments

@abhisheksgumadi
Copy link

Dear Team,

I am using the pre-training script to pre-train BLIP on a custom dataset (containing around 1M image/text pairs).

I see that the machine RAM utilization continuously increases and at a point it reaches 100%. The machine has 120GB RAM!.

Any idea where the problem could be?

image

@woctezuma
Copy link

Do you have custom code which could have a memory leak?

@abhisheksgumadi
Copy link
Author

We have a a custom dataloder that loads images and text from a parquet file.

@abhisheksgumadi
Copy link
Author

We have 1 Million images stored on disk and we have prepared the JSON file as described in the Github read me page. The Dataloader we have loads the json file in memory in the __init__ method and then in the __get_item__ method it loads the image from the corresponding path inside the json file. Also returns back the text.

Now sure why the RAM utilization is so high? Any idea please? Thanks

@LiJunnan1992
Copy link
Contributor

Hi, it could be related to the dataloader.

@abhisheksgumadi
Copy link
Author

We ended up using the pretrain_dataset.py file and formatted the data as a json file exactly as mentioned in the readme file. Even then we see the RAM utilization go to 100%. So now we have just formatted the dataset as required with no changes to the code. So we dont even have our own custom code.

@abhisheksgumadi
Copy link
Author

We are happy to follow any other debugging steps to make this a success please. - thanks

@asgsaeid
Copy link

Was wondering if there has been any update on this. We ran the pretrain.py and saw the same issue: RAM size increases when the jason files are being read and at some point, RAM explodes. For pretraining, what python version did you use and what was the RAM size?

@LiJunnan1992
Copy link
Contributor

@abhisheksgumadi @asgsaeid
You may want to try out our new library which supports BLIP and see if the issue still remains:
https://github.com/salesforce/LAVIS

@abhisheksgumadi
Copy link
Author

Thanks, will take a look

@dyashuni
Copy link

@aries-young
Copy link

Was wondering if there has been any update on this. We ran the pretrain.py and saw the same issue: RAM size increases when the jason files are being read and at some point, RAM explodes. For pretraining, what python version did you use and what was the RAM size?

Have you solved this problem?Could you kindly provide some suggestions ?

@aries-young
Copy link

Thanks, will take a look

Have you solved this problem?Could you kindly provide some suggestions ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants