Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing bert_cos_score_idf #69

Merged
merged 2 commits into from
Jul 4, 2020
Merged

Conversation

ethanjperez
Copy link
Contributor

@ethanjperez ethanjperez commented Jul 3, 2020

  1. Pad BERT embeddings on GPU instead of CPU. Padding on CPU is the bottleneck in computing the greedy matching, so padding on GPU speeds up the matching by ~3x for me. Moving tensors to GPU then becomes the bottleneck, but it also takes ~2x less time to move pre-padding tensors to GPU, I think since you don't have to move a bunch of padding numbers. So overall I get a ~6x speed up on the sequences I'm evaluating
  2. Using torch.no_grad() when computing greedy matching to save memory. I was able to increase the batch size for greedy matching by 2x after doing this. I'm not sure if increasing the batch size here will cause OOMs for others though, so it might be worth someone else checking/trying it out (or just removing the batch size increase).

Edit: I was finding some OOMs with the batch size increase, so I removed that

1) Pad BERT embeddings on GPU instead of CPU. Padding on CPU is the bottleneck in computing the greedy matching, so padding on GPU speeds up the matching by ~3x for me. Moving tensors to GPU then becomes the bottleneck, but it also takes ~2x less time to move pre-padding tensors to GPU, I think since you don't have to move a bunch of padding numbers. So overall I get a ~6x speed up on the sequences I'm evaluating
2) Using `torch.no_grad()` when computing greedy matching to save memory. I was able to increase the batch size for greedy matching by 2x after doing this. I'm not sure if increasing the batch size here will cause OOMs for others though, so it might be worth someone else checking/trying it out (or just removing the batch size increase).
Occasionally found OOMs with batch size increase for greedy matching only, so I removed that
@Tiiiger
Copy link
Owner

Tiiiger commented Jul 3, 2020

hi @ethanjperez , thank you for the contribution! I'll test this and merge afterwards.

@Tiiiger
Copy link
Owner

Tiiiger commented Jul 3, 2020

@ethanjperez actually I tested on the 3003 reference-candidate pairs (examples/hyps_long.txt and examples/refs_long.txt) and I didn't observe any significant speedup.

What are your test sequences like?

I wonder if you are testing on much more pairs?

@ethanjperez
Copy link
Contributor Author

ethanjperez commented Jul 4, 2020

I'm using long sequences (many sentences), and I'm also doing leave-one-out reference evaluation. E.g., I have 10 references, and I want to evaluate each reference against the others (10 x 9 = 90 pairs). So for my situation, I need many more pairwise evaluations than BERT forward passes, so the matching was the slowest part. (The 6x speed up I found was only for the matching step specifically, but that part is pretty fast I think for normal MT evaluation.)

This change is just a suggestion that helped me (mostly useful for when you have lot of pairs, which isn't the standard case), so feel free to ignore the PR too :)

@Tiiiger Tiiiger merged commit 4c10f36 into Tiiiger:master Jul 4, 2020
@Tiiiger
Copy link
Owner

Tiiiger commented Jul 4, 2020

I see. I think theses are reasonable changes. I am going to merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants