Reproducing the validation accuracy vs learning rates curve on ResNet #67

liulei277 · 2023-12-21T09:07:08Z

Hello!
We tried to reproduce the experiment in your paper (Figure 16, ResNet on CIFAR-10 for different widths (compared to a base network).
We made some modifications to examples/ResNet/main.py:

for width_mult in [0.5, 1.0, 2.0, ]:
        for log2lr in np.linspace(-3, 0, 7): 
             net = getattr(resnet, args.arch)(wm=width_mult)
             ...
             if args.optimizer == 'musgd':
                 optimizer = MuSGD(net.parameters(), lr=2**log2lr,
                                    momentum=args.momentum,
                                    weight_decay=args.weight_decay)
            ...

And we ran the following commands:

# mup
python main.py --load_base_shapes resnet18.bsh

Then we got the following picture:

Is there anything wrong in our implementation? Thanks.

The text was updated successfully, but these errors were encountered:

liulei277 · 2023-12-21T09:18:44Z

What's more, we ran the following commands with the default examples/ResNet/main.py:

# mup
python main.py --load_base_shapes resnet18.bsh --lr 0.5 --width_mult 0.5

After running 10 epochs, the learning rate we obtain is 82.14%. It's different from the accuracy(92.78%) in your paper Table 12: ResNet on CIFAR10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing the validation accuracy vs learning rates curve on ResNet #67

Reproducing the validation accuracy vs learning rates curve on ResNet #67

liulei277 commented Dec 21, 2023 •

edited

Loading

liulei277 commented Dec 21, 2023 •

edited

Loading

Reproducing the validation accuracy vs learning rates curve on ResNet #67

Reproducing the validation accuracy vs learning rates curve on ResNet #67

Comments

liulei277 commented Dec 21, 2023 • edited Loading

liulei277 commented Dec 21, 2023 • edited Loading

liulei277 commented Dec 21, 2023 •

edited

Loading

liulei277 commented Dec 21, 2023 •

edited

Loading