Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much system ram is required per gpu for interhand3d dataset? #672

Open
pablovela5620 opened this issue May 25, 2021 · 6 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@pablovela5620
Copy link

Looking at the log provided it looks like 8 Titan x gpus were used to train the interhand dataset with a batch size of 16 and 2 workers per gpu.

The full interhand dataset is pretty massive (<1 million images) and my understanding is that per worker and gpu one loads up the entire dataset into system ram (not gpu vram) so even with lets say a 128gb 8 gpus *2 workers = a HUGE amount of system ram. Am I understanding this correctly? I haven't had a chance to test yet

How much system ram did the machine that was used to train have? It seems super difficult to try to retrain on a multi GPU system without a really significant amount of system ram (>256gb?).

@innerlee innerlee added the question Further information is requested label May 25, 2021
@innerlee
Copy link
Contributor

one loads up the entire dataset into system ram

this is not the case

@pablovela5620
Copy link
Author

Understood, so I had a chance to try and train the model using the provided config. I'm using a machine with 128gb of ram and 2 A6000 gpus.

When I run on a single gpu using
python tools/train.py configs/hand3d/InterNet/interhand3d/res50_interhand3d_all_256x256.py
it uses up about 30GB of ram to load and train the network, the reason I assumed it loaded the entire dataset into system ram is the large amount of ram when using distributed training.

after tools/dist_train.sh I have the following problem.

Using the provided config with dist_train and only changing num gpus and num workers

  • 1 gpu 2 workers ~ 68GB of ram
  • 2 gpu's 1 worker ~ 90GB of ram
  • 2 gpu's 2 workers ~ process killed, out of memory error (I have a total 128 GB of ram)

So with this testing, I had the following questions

  1. How do I manage the amount of ram used without sacrificing the number of workers?
  2. Is this typical amount of ram for this dataset?
  3. What if I want to use the 30fps version of the dataset (13 million images vs the 1.3million so around 10 times larger)? Since my guess is this would increase the amount of ram need by a TON

I really appreciate the help!

@ly015
Copy link
Member

ly015 commented May 26, 2021

@zengwang430521 Could you please check this issue?

@zengwang430521
Copy link
Collaborator

zengwang430521 commented May 27, 2021

Hi @pablovela5620.
We load all annotation into memory before training, and this will cost a lot of memory. So if you find memory insufficient, you can use less workers.
And it's grateful that our implementation may not be suitable for 30-fps version now, because tit's too massive.

@innerlee innerlee added enhancement New feature or request and removed question Further information is requested labels May 27, 2021
@innerlee innerlee assigned zengwang430521 and unassigned ly015 May 27, 2021
@innerlee
Copy link
Contributor

@zengwang430521 The implementation could be improved.

@pablovela5620
Copy link
Author

pablovela5620 commented May 27, 2021

@zengwang430521 so with the current implementation it seems like there are basically two solutions if using distributed single node training

  1. Reduce the number of workers (in my case I can only use 1)
  2. Buy more ram

I did notice that using distributed training with 1 gpu vs normal training with 1 gpu results in higher ram usage (68gb vs ~30gb). Not totally sure as to why. Some clarity here would be appreciated.

Also how much ram did the 8 gpu 2 worker machine use when training on the interhand3d dataset?

If I was to modify the dataset implementation (so that I could get it working with 30FPS version), it seems like its more of a design decision over the whole of mmpose hand datasets. I may be completely wrong here, and please correct me if I am, the use of xtcocotools in HandBaseDataset

from xtcocotools.coco import COCO

self.coco = COCO(ann_file)        
self.img_ids = self.coco.getImgIds()

basically loads the entire annotation into memory for any dataset that depends on it, also looking at Interhand2D/Interhand3D and others when calling def _get_db()

with open(self.camera_file, 'r') as f:            
    cameras = json.load(f)        
with open(self.joint_file, 'r') as f:            
    joints = json.load(f)

is what is eating up all the system memory inside the gt_db object. This seems consistent with all other datasets as well of first loading the entire dataset and then running the augmentation/preprocessing pipelines

So rather than loading the entire dataset, I would have to overload def __getitem__(self, idx):
to load the dataset on each call rather than all at once? Does this make sense or are there some other considerations I should be looking at and downsides of not loading all at once

rollingman1 pushed a commit to rollingman1/mmpose that referenced this issue Nov 5, 2021
* move readme_zh_cn

* fix file link
HAOCHENYE pushed a commit to HAOCHENYE/mmpose that referenced this issue Jun 27, 2023
* Add test of get_hooks_info()

* Change to use original Runner for get_hook_info() test

* Change to test after_train_iter hooks for get_hook_info()

* Complement the stages list

* Add logging hooks information in Runner.__init__()

* Rearrange the stages list

* Restore the stages to tuple type

* Clean the unnecessary changes

* Replace  statement with TestCase's methods

* add test stages in method_stages_map

* change the hooks info into a f-string

* return list(trigger_stages) directly

* change keys of method_stages_map

* Fix previous changes to method_stages_map.keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants