The Phase-Switch Hybrid Adam–SGD Optimizer for Non-Convex Deep Learning: Evaluation on the MNIST and CIFAR-10 Dataset

Harish Kunder; Manjunath Kotari

doi:10.48084/etasr.18573

Authors

Harish Kunder Department of Artificial Intelligence and Machine Learning, Alva's Institute of Engineering and Technology, Moodbidri, India | Visvesvaraya Technological University, Belagavi, India
Manjunath Kotari Department of Computer Science and Engineering, Alva's Institute of Engineering and Technology, Moodbidri, India | Visvesvaraya Technological University, Belagavi, India

Volume: 16 | Issue: 3 | Pages: 36530-36539 | June 2026 | https://doi.org/10.48084/etasr.18573

Received: 6 March 2026 | Revised: 25 March 2026 and 17 April 2026 | Accepted: 30 April 2026 | Online: 26 May 2026

Corresponding author: Harish Kunder

Abstract

Deep neural networks are highly likely to face non-convex optimization issues during training because of local minima, saddle points, and highly complex loss surfaces. Existing optimizers, such as Adam and Stochastic Gradient Descent (SGD), optimize faster or generalize better; however, they cannot optimize both properties effectively for non-convex optimization problems. This study proposes a phase switch hybrid optimization method to optimize and improve the training of deep neural networks. The proposed method uses Adam for faster convergence during the initial phase and SGD with momentum for better generalization during the latter phase. The hybrid optimization method combines the advantages of both Adam and SGD, enabling faster convergence and better generalization during training. The method is evaluated on the well-known datasets MNIST and CIFAR-10. The results obtained using the proposed method are better or comparable to existing methods on different metrics such as accuracy and loss minimization. The introduced method achieves improved performance, with accuracy gains of up to 0.4–0.6% on the MNIST dataset and up to 1–2% on the CIFAR-10 dataset compared to baseline optimizers, along with lower test loss in several learning rate settings. The study demonstrates that the proposed method is an effective solution for handling non-convex optimization problems and improves the robustness of training on different datasets.

Keywords:

deep learning, Adam, SGD, MNIST, CIFAR-10, loss function, accuracy

References

S. Boyd and L. Vandenberghe, Convex Optimization, 1st ed. Cambridge, UK: Cambridge University Press, 2004.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning Internal Representations by Error Propagation," Defense Technical Information Center, Fort Belvoir, VA, ADA164453, Sep. 1985.

Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, "Efficient BackProp," in Neural Networks: Tricks of the Trade, Vol. 1524, G. B. Orr and K.-R. Müller, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 9–50.

L. Bottou, F. E. Curtis, and J. Nocedal, "Optimization Methods for Large-Scale Machine Learning," SIAM Review, vol. 60, no. 2, pp. 223–311, Jan. 2018.

S. Ruder, "An Overview of Gradient Descent Optimization Algorithms." arXiv, 2016.

R. Gower et al., "SGD: General Analysis and Improved Rates," in 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, Art. no. PMLR 97.

D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in International Conference on Learning Representations, Banff, AB, Canada, Apr. 2014.

S. J. Reddi, S. Kale, and S. Kumar, "On the Convergence of Adam and Beyond," in 6th International Conference on Learning Representations, Vancouver, BC, Canada, Feb. 2018.

X. Chen, S. Liu, R. Sun, and M. Hong, "On the Convergence of a Class of Adam-Type Algorithms for Non-Convex Optimization," in 7th International Conference on Learning Representations, New Orleans, LA, USA, May 2019.

P. Nazari, D. A. Tarzanagh, and G. Michailidis, "DADAM: A Consensus-Based Distributed Adaptive Gradient Method for Online Optimization," in 7th International Conference on Learning Representations, New Orleans, LA, USA, May 2019.

P. Jain and P. Kar, Non-convex Optimization for Machine Learning. Hanover, MA, USA: Now Publishers Inc., 2017.

Y. N. Dauphin et al., "Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization," in Proceedings of the 28th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, Dec. 2014, vol. 2, pp. 2933–2941.

M. Razaviyayn, T. Huang, S. Lu, M. Nouiehed, M. Sanjabi, and M. Hong, "Nonconvex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances," IEEE Signal Processing Magazine, vol. 37, no. 5, pp. 55–66, Sep. 2020.

M. Nouiehed, M. Sanjabi, T. Huang, and J. D. Lee, "Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods," in 33rd Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2019.

P. Xu, F. Roosta-Khorasani, and M. W. Mahoney, "Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study." arXiv, 2017.

T.-H. Chang, M. Hong, H.-T. Wai, X. Zhang, and S. Lu, "Distributed Learning in the Nonconvex World: From Batch Data to Streaming and Beyond," IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 26–38, May 2020.

K. Kurach, M. Lucic, X. Zhai, M. Michalski, and S. Gelly, "A Large-Scale Study on Regularization and Normalization in GANs," in Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019.

G. B. Fotopoulos, P. Popovich, and N. H. Papadopoulos, "Review Non-Convex Optimization Method for Machine Learning." arXiv, 2024.

R. K. Sharma, C. Singh, and A. Shing, "Navigating Complexity: Optimization Challenges in Non-Convex Deep Learning Objective," International Journal of Novel Research and Development, vol. 9, no. 9, pp. 628–639, Sep. 2024.

J. Wang and A. Choromanska, "A Survey of Optimization Methods for Training DL Models: Theoretical Perspective on Convergence and Generalization." arXiv, 2025.

R. Kashyap, "A Survey of Deep Learning Optimizers -- First and Second Order Methods." arXiv, 2022.

R. Abdulkadirov, P. Lyakhov, and N. Nagornov, "Survey of Optimization Algorithms in Modern Neural Networks." Computer Science and Mathematics, Apr. 20, 2023.

S. Peitz and S. S. Hotegni, "Multi-Objective Deep Learning: Taxonomy and Survey of the State of the Art," Machine Learning with Applications, vol. 21, Sep. 2025, Art. no. 100700.

G. Dai, W. Wu, Z. Wang, J. Fu, S. Zhang, and T. Huang, "HUB: Guiding Learned Optimizers with Continuous Prompt Tuning," in International Conference on Learning Representations, Vienna, Austria, May 2024.

R. Zhao, D. Morwani, D. Brandfonbrener, N. Vyas, and S. Kakade, "Deconstructing What Makes a Good Optimizer for Language Models." arXiv, 2024.

C. Katyal, "Differentiable Convex Optimization Layers in Neural Architectures: Foundations and Perspectives." arXiv, 2024.

S. A. Aula and T. A. Rashid, "Foxtsage vs. Adam: Revolution or Evolution in Optimization?" Jan. 05, 2025.

A. Fahim, A. M. Osman, Z. Tarek, and A. M. Elshewey, "Enhancing Air Quality Index Classification Based on Ensemble Machine Learning Techniques," Engineering, Technology & Applied Science Research, vol. 15, no. 6, pp. 29325–29333, Dec. 2025.

J. Khodabakhsh, "MNIST Dataset." Kaggle, May 2019, [Online]. Available: https://www.kaggle.com/datasets/hojjatk/mnist-dataset.

I. Goodfellow, A. Courville, and Y. Bengio, Deep Learning. Cambridge, MA, USA: The MIT Press, 2016.

"CIFAR-10 - Object Recognition in Images." Kaggle, 2014, [Online]. Available: https://www.kaggle.com/c/cifar-10.

The Phase-Switch Hybrid Adam–SGD Optimizer for Non-Convex Deep Learning: Evaluation on the MNIST and CIFAR-10 Dataset

Authors

Abstract

Keywords:

References

Downloads

How to Cite

Metrics

License

template

Download the latest version of our template (March 13, 2026)