Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Naganuma, Hiroki; Ahuja, Kartik; Takagi, Shiro; Motokawa, Tetsuya; Yokota, Rio; Ishikawa, Kohta; Sato, Ikuro; Mitliagkas, Ioannis

Computer Science > Machine Learning

arXiv:2211.08583 (cs)

[Submitted on 15 Nov 2022 (v1), last revised 5 Jun 2023 (this version, v3)]

Title:Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Authors:Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

View PDF

Abstract:Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution. While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address this question for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as testbeds for studying different types of shifts -- namely correlation and diversity shift. We search over a wide range of hyperparameters and examine classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings, which we expect to be helpful for practitioners: i) adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum SGD) on out-of-distribution performance. In particular, even though there is no significant difference in in-distribution performance, we show a measurable difference in out-of-distribution performance. ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset -- linear returns, increasing returns, and diminishing returns. For example, in the training of natural language data using Adam, fine-tuning the performance of in-distribution performance does not significantly contribute to the out-of-distribution generalization performance.

Comments:	Accepted to TMLR
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2211.08583 [cs.LG]
	(or arXiv:2211.08583v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.08583

Submission history

From: Hiroki Naganuma [view email]
[v1] Tue, 15 Nov 2022 23:56:30 UTC (2,907 KB)
[v2] Fri, 18 Nov 2022 21:29:02 UTC (2,907 KB)
[v3] Mon, 5 Jun 2023 22:23:52 UTC (3,486 KB)

Computer Science > Machine Learning

Title:Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators