Support gradient accumulation using a forward fn decorator with lax.scan #614

cemkoc · 2024-07-30T23:45:22Z

This PR adds a new forward function decorator into the learner config which allows users to define their own decorators that could transform the forward function behaviour. One such example of this use case is gradient accumulation which is what this PR is about.

Specifically, this PR enables gradient accumulation using a a new forward function decorator called with_minibatch_steps which internally implements a lax.scan loop over minibatches sliced with dynamic input batch slicing. The new forward function decorator, called with_minibatch_steps, wraps the forward function and accumulates gradients using a jax.lax.scan for the number of gradient accumulation steps which the user specifies. To use this decorator the learner config is extended with an optional forward_fn_decorator parameter which will be specified by the user in the following way if they want to enable gradient accumulation:

Example:

gradient_accumulation_steps = 4

learner.forward_fn_decorator = config.config_for_function(with_minibatch_steps).set(
    steps=gradient_accumulation_steps,
    metric_accumulator=MetricAccumulator.default_config(),
)

Since we scan over the forward function (instead of the value_and_grad func) we need to have a way to compute the gradients in a minibatched manner therefore we wrap the forward function with a custom_vjp implementation to compute the gradients and accumulate during the forward phase. Since the gradients are accumulated and computed during the forward phase we simply pass them to the backward phase and use them as is. This allows us to compute and accumulate the gradients in a memory efficient (minibatched) manner.

A similar effort for implementing grad accumulation using a scan was mentioned by @apoorvtintin.

markblee · 2024-07-30T23:54:41Z

axlearn/common/gradient_accumulation.py

+ An integer representing minibatch size.
+
+ Raises:
+ ValueError if the input batch is not divisible by steps.


Sorry for missing this earlier, but we should document the other raise conditions too.

sounds good, let me update it now

cemkoc requested review from ruomingp and markblee as code owners July 30, 2024 23:45

markblee approved these changes Jul 30, 2024

View reviewed changes

support gradient accumulation using a scan based forward fn decorator

6f5c42d

cemkoc force-pushed the cemkoc/grad-accum-scan branch from 8047308 to 6f5c42d Compare July 31, 2024 01:43

ruomingp approved these changes Jul 31, 2024

View reviewed changes

cemkoc added this pull request to the merge queue Jul 31, 2024

Merged via the queue into apple:main with commit 5ef4825 Jul 31, 2024
4 checks passed

cemkoc deleted the cemkoc/grad-accum-scan branch July 31, 2024 17:51

apghml mentioned this pull request Jul 31, 2024

Gradient Accumulation in Axlearn #465

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support gradient accumulation using a forward fn decorator with lax.scan #614

Support gradient accumulation using a forward fn decorator with lax.scan #614

cemkoc commented Jul 30, 2024 •

edited

Loading

markblee Jul 30, 2024

cemkoc Jul 30, 2024

cemkoc Jul 31, 2024

Support gradient accumulation using a forward fn decorator with lax.scan #614

Support gradient accumulation using a forward fn decorator with lax.scan #614

Conversation

cemkoc commented Jul 30, 2024 • edited Loading

markblee Jul 30, 2024

Choose a reason for hiding this comment

cemkoc Jul 30, 2024

Choose a reason for hiding this comment

cemkoc Jul 31, 2024

Choose a reason for hiding this comment

cemkoc commented Jul 30, 2024 •

edited

Loading