Khan, W, Ali, S, Muhammad, USK, Jawad, M, Ali, M and Nawaz, R ORCID: https://orcid.org/0000-0001-9588-0052 (2020) AdaDiffGrad: An Adaptive Batch Size Implementation Technique for DiffGrad Optimization Method. In: 14th International Conference on Innovations in Information Technology (IIT), 17 November 2020 - 18 November 2020, Al Ain, United Arab Emirates.
|
Accepted Version
Available under License In Copyright. Download (203kB) | Preview |
Abstract
Stochastic Gradient Descent is a major contributor to the success of the deep neural networks. The gradient provides basic knowledge about the function direction and its rate of change. However, SGD changes the step size equally for all parameters irrespective of their gradient behavior. Recently, several efforts have been made to improve the SGD method, such as AdaGrad, RMSprop, Adam, and diffGrad. The diffGrad is an appropriate and enhanced technique that uses fraction constant based on previous gradient information for gradient calculation. This fraction constant decreases the momentum resulting in slow convergence towards an optimal solution. This paper addresses the slow convergence problem of the diffGrad algorithm and proposed a new adaDiffGrad algorithm. In adaDiffGrad an adoptive batch size is implemented for the diffGrad to overcome the problem of slow convergence. The proposed model is experimented for image categorization and classification over CIFAR10, CIFAR100, and FakeImage dataset. The results are compared with the state of art models, such as Adam, AdaGrad, DiffGrad, RMSprop, and, SGD. The results show that adaDiffGrad outperforms other optimizers and improves the accuracy of the diffGrad.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.