-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU Training #42
Comments
Hi @tharinduk90, thanks for your interest in the project! Unfortunately, we have not thoroughly tested multi GPU training as we never used it - we always did single-GPU training. As you already mentioned, this issue might be interesting for achieving results faster with less memory consumption. Good luck with your research! |
@m-niemeyer , thank you very much for your reply, in the multi view experiment , if i set the batch_size =2 , batch_size_val =2 , it will give following error. (since batch size refers to number of images sampled , i expect this to work) Traceback (most recent call last): |
I want to do multi_view_reconstruction for real images. so far i have got good result. But now i want to speed up the training using multi gpu. Because for complex models it take more than 10 hours with my local pc.(6 gb gpu memory pc).
For the multi gpu training i added "multi_gpu: true" for the config.yaml(ours_depth_mvs.yaml) . I used p3.8xlarge(4 gpus ,with each having 16 gb memory) from aws for multi gpu testing.The config file is as follows.
data:
path: data/DTU
ignore_image_idx: []
classes: ['scan244']
dataset_name: DTU
n_views: 51
input_type: null
train_split: null
val_split: null
test_split: null
cache_fields: True
split_model_for_images: true
depth_range: [0., 1400.]
img_extension: png
img_extension_input: jpg
depth_extension: png
mask_extension: png
model:
c_dim: 0
encoder: null
patch_size: 2
lambda_image_gradients: 1.
lambda_depth: 1.
lambda_normal: 0.1
training:
out_dir: out/multi_view_reconstruction/angel/ours_depth_mvs
n_training_points: 2048
n_eval_points: 8000
model_selection_metric: mask_intersection
model_selection_mode: maximize
batch_size: 1
batch_size_val: 1
scheduler_milestones: [3000, 5000]
scheduler_gamma: 0.5
depth_loss_on_world_points: True
validate_every: 5000
visualize_every: 10000
multi_gpu: true
generation:
upsampling_steps: 4
refinement_step: 30
But when i check the usage of the gpus, the result is as follows.
only the gpu:0 is used.
i have checked #9.
The text was updated successfully, but these errors were encountered: