A comparison between implementations of different gradient-based optimization algorithms (Gradient Descent, Adam, Adamax, Nadam, Amsgrad). The comparison was made on some of the most common functions used for testing optimization algorithms.
https://www.sfu.ca/~ssurjano/optimization.html
https://ruder.io/optimizing-gradient-descent/index.html
https://towardsdatascience.com/adam-latest-trends-in-deep-learning-optimization-6be9a291375c