Fix: Preserve input dtype in Dropout layer output #1323

bhargavyagnik · 2024-08-13T08:22:19Z

Modified Dropout implementation to ensure that the output dtype matches the input dtype.
This resolves the issue [BUG] Dropout not preserving dtype #1321

Proposed changes

I re-ordered the equation that calculated Dropout to preserve the input dtype. The underlying problem in Issue #1321 mentioned #1321 (comment)

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

- Modified Dropout implementation to ensure that the output dtype matches the input dtype. - This resolves the issue ml-explore#1321

awni · 2024-08-13T13:09:29Z

Thanks for the fix! Two comments:

Could you add a test to https://github.com/ml-explore/mlx/blob/main/python/tests/test_nn.py#L962
I would use parenthesis to be extra explicit here. The python order makes sense but it might not be as obvious to people reading the code that the order shouldn't be changed.

awni · 2024-08-13T14:45:45Z

I think the correct function to use is assertEqual or we can use assertTrue( x==y) .

Those are indeed bugs. The right function is assertEqual. Would be great if you could send a PR to fix it.

bhargavyagnik · 2024-08-13T14:49:27Z

Yeah, I'll add them in this PR itself. There were less than I anticipated.

- Revised test cases to align with updated dropout code - Fixed assertion method: replaced self.assertTrue with self.assertEqual for accurate comparisons in test_nn.py -> test_rope, test_alibi and test_dropout,

bhargavyagnik · 2024-08-13T14:56:56Z

I added the changes, let me know if you have any suggestions. Thanks for the review !!

awni · 2024-08-13T15:13:13Z

python/mlx/nn/layers/dropout.py

@@ -32,7 +32,7 @@ def __call__(self, x):

 mask = mx.random.bernoulli(self._p_1, x.shape)

- return (1 / self._p_1) * mask * x
+ return (mask * x) / self._p_1


Sorry I missed this. You should still do the scaling with a multiplication by (1 / self._p1) which will be a single divide followed by a bunch of multiplies as opposed to a bunch of divides. Divide is more expensive than multiply.

Oh, okay I'll make that change

Hi @awni
Sorry, I'm still learning so please excuse me if this is out of topic and a very small impact question but ..

import mlx.core as mx import timeit x = mx.random.uniform(shape=(100000, 1000)) y = mx.random.uniform(shape=(100000, 1000)) x = x * 124.215**6 y = y * 135.353**6 def direct_division(): return mx.eval(x/y) def multiply_reciprocal(): return mx.eval(x * (1/y)) print("Direct division:", timeit.timeit(direct_division, number=1000)) print("Multiply reciprocal:", timeit.timeit(multiply_reciprocal, number=1000))

The output is :
Direct division: 6.718534375000672
Multiply reciprocal: 10.876147124999989

I read some articles over https://stackoverflow.com/questions/1117674/why-is-multiplying-cheaper-than-dividing?rq=3, https://stackoverflow.com/questions/15745819/why-is-division-more-expensive-than-multiplication

I understood that division is more expensive than multiplication, but when I observe the performance of direct_division, it is faster than multiply_reciprocal..
I also experimented by multiplying x and y with 1.7976931348623157e+308 but still seems direct_division is faster. What could be the underlying reason?

No problem. In your example y is an array so you are dividing a bunch of stuff in both cases. So the second one is strictly more work than the first (it includes the first in fact).

In the Dropout example p is a scalar. So a benchmark would be:

import mlx.core as mx import timeit x = mx.random.uniform(shape=(100000, 1000)) y = 5.0 def direct_division(): return mx.eval(x/y) def multiply_reciprocal(): return mx.eval(x * (1/y)) print("Direct division:", timeit.timeit(direct_division, number=1000)) print("Multiply reciprocal:", timeit.timeit(multiply_reciprocal, number=1000))

Also you probably still won't see a time difference because both operations are memory bandwidth bound. But if you measure power use (which isn't so easy to do reliably) you would might see lower power consumption for the second. It could be marginal.. but in this case why not since it's trivial to use the slightly more efficient version.

Yeah, i see. I was just exploring so thought to test it out. Thank you so much for the quick reply!

awni · 2024-08-13T15:13:33Z

python/tests/test_nn.py

+ self.assertEqual(y.shape, x.shape)
+ self.assertEqual(y.dtype, mx.bfloat16)


Thank you for fixing those!!

awni

Looks great. Will merge when the tests clear.

Fix: Preserve input dtype in Dropout layer output

7b1fad5

- Modified Dropout implementation to ensure that the output dtype matches the input dtype. - This resolves the issue ml-explore#1321

Update test cases in test_nn.py

089dfda

- Revised test cases to align with updated dropout code - Fixed assertion method: replaced self.assertTrue with self.assertEqual for accurate comparisons in test_nn.py -> test_rope, test_alibi and test_dropout,

awni reviewed Aug 13, 2024

View reviewed changes

updated dropout.py

9c872ab

awni approved these changes Aug 13, 2024

View reviewed changes

awni merged commit a098bc9 into ml-explore:main Aug 13, 2024
5 checks passed

bhargavyagnik deleted the dropout_bug branch August 13, 2024 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Preserve input dtype in Dropout layer output #1323

Fix: Preserve input dtype in Dropout layer output #1323

bhargavyagnik commented Aug 13, 2024

awni commented Aug 13, 2024

awni commented Aug 13, 2024

bhargavyagnik commented Aug 13, 2024

bhargavyagnik commented Aug 13, 2024

awni Aug 13, 2024

bhargavyagnik Aug 13, 2024

bhargavyagnik Aug 13, 2024

awni Aug 13, 2024

bhargavyagnik Aug 13, 2024

awni Aug 13, 2024

awni left a comment

		self.assertEqual(y.shape, x.shape)
		self.assertEqual(y.dtype, mx.bfloat16)

Fix: Preserve input dtype in Dropout layer output #1323

Fix: Preserve input dtype in Dropout layer output #1323

Conversation

bhargavyagnik commented Aug 13, 2024

Proposed changes

Checklist

awni commented Aug 13, 2024

awni commented Aug 13, 2024

bhargavyagnik commented Aug 13, 2024

bhargavyagnik commented Aug 13, 2024

awni Aug 13, 2024

Choose a reason for hiding this comment

bhargavyagnik Aug 13, 2024

Choose a reason for hiding this comment

bhargavyagnik Aug 13, 2024

Choose a reason for hiding this comment

awni Aug 13, 2024

Choose a reason for hiding this comment

bhargavyagnik Aug 13, 2024

Choose a reason for hiding this comment

awni Aug 13, 2024

Choose a reason for hiding this comment

awni left a comment

Choose a reason for hiding this comment