-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How much system ram is required per gpu for interhand3d dataset? #672
Comments
this is not the case |
Understood, so I had a chance to try and train the model using the provided config. I'm using a machine with 128gb of ram and 2 A6000 gpus. When I run on a single gpu using after Using the provided config with
So with this testing, I had the following questions
I really appreciate the help! |
@zengwang430521 Could you please check this issue? |
Hi @pablovela5620. |
@zengwang430521 The implementation could be improved. |
@zengwang430521 so with the current implementation it seems like there are basically two solutions if using distributed single node training
I did notice that using distributed training with 1 gpu vs normal training with 1 gpu results in higher ram usage (68gb vs ~30gb). Not totally sure as to why. Some clarity here would be appreciated. Also how much ram did the 8 gpu 2 worker machine use when training on the interhand3d dataset? If I was to modify the dataset implementation (so that I could get it working with 30FPS version), it seems like its more of a design decision over the whole of mmpose hand datasets. I may be completely wrong here, and please correct me if I am, the use of xtcocotools in from xtcocotools.coco import COCO
self.coco = COCO(ann_file)
self.img_ids = self.coco.getImgIds() basically loads the entire annotation into memory for any dataset that depends on it, also looking at Interhand2D/Interhand3D and others when calling with open(self.camera_file, 'r') as f:
cameras = json.load(f)
with open(self.joint_file, 'r') as f:
joints = json.load(f) is what is eating up all the system memory inside the So rather than loading the entire dataset, I would have to overload |
* move readme_zh_cn * fix file link
* Add test of get_hooks_info() * Change to use original Runner for get_hook_info() test * Change to test after_train_iter hooks for get_hook_info() * Complement the stages list * Add logging hooks information in Runner.__init__() * Rearrange the stages list * Restore the stages to tuple type * Clean the unnecessary changes * Replace statement with TestCase's methods * add test stages in method_stages_map * change the hooks info into a f-string * return list(trigger_stages) directly * change keys of method_stages_map * Fix previous changes to method_stages_map.keys
Looking at the log provided it looks like 8 Titan x gpus were used to train the interhand dataset with a batch size of 16 and 2 workers per gpu.
The full interhand dataset is pretty massive (<1 million images) and my understanding is that per worker and gpu one loads up the entire dataset into system ram (not gpu vram) so even with lets say a 128gb 8 gpus *2 workers = a HUGE amount of system ram. Am I understanding this correctly? I haven't had a chance to test yet
How much system ram did the machine that was used to train have? It seems super difficult to try to retrain on a multi GPU system without a really significant amount of system ram (>256gb?).
The text was updated successfully, but these errors were encountered: