AlphaGrad: Normalized Gradient Descent for Adaptive Multi-loss Functions in EEG-based Motor Imagery Classification

1Vidyasirimedhi Institute of Science & Technology (VISTEC), 2Rajamangala University of Technology Krungthep
*Indicates Equal Contribution
AlphaGrad

Abstract

In this study, we propose AlphaGrad, a novel adaptive loss blending strategy for optimizing multi-task learning (MTL) models in motor imagery (MI)-based electroencephalography (EEG) classification. AlphaGrad is the first method to automatically adjust multi-loss functions with differing metric scales, including mean square error, cross-entropy, and deep metric learning, within the context of MI-EEG. We evaluate AlphaGrad using two state-of-the-art MTL-based neural networks, MIN2Net and FBMSNet, across four benchmark datasets. Experimental results show that AlphaGrad consistently outperforms existing strategies such as AdaMT, GradApprox, and fixed-weight baselines in classification accuracy and training stability. Compared to baseline static weighting, AlphaGrad achieves over 10% accuracy improvement on subject-independent MI tasks when evaluated on the largest benchmark dataset. Furthermore, AlphaGrad demonstrates robust adaptability across various EEG paradigms—including steady-state visually evoked potential (SSVEP) and event-related potential (ERP), making it broadly applicable to brain-computer interface (BCI) systems. We also provide gradient trajectory visualizations highlighting AlphaGrad’s ability to maintain training stability and avoid local minima. These findings underscore AlphaGrad’s promise as a general-purpose solution for adaptive multi-loss optimization in biomedical time-series learning.

Classification accuracy of different models across subjects

The figure presents the classification accuracy of different models across subjects in the OpenBMI dataset for a 3-class motor imagery (MI) task, evaluated under a 5-fold cross-validation setting...

Classification accuracy

Subject-wise accuracy difference

Subject-wise accuracy difference (%) between AlphaGrad (used as the reference) and other loss blending strategies...

Subject-wise accuracy difference

Trajectory of gradient descent (GD) for a multi-task objective

Apart from AlphaGrad, the gradient descent trajectories of the other strategies become trapped in local minima for up to two of the three initial parameter sets. This behavior indicates that the gradient of one task overshadows the other, causing the optimization process to oscillate between the steep valley walls without significant progress along the valley floor

Multi-task Objective

Baseline

Baseline

GradApprox

AdaMT

AlphaGrad

BibTeX

BibTeX
@ARTICLE{11008918,
  author    = {Chaisaen, Rattanaphon and Autthasan, Phairot and Ditthapron, Apiwat and Wilaiprasitporn, Theerawit},
  journal   = {IEEE Journal of Biomedical and Health Informatics},
  title     = {AlphaGrad: Normalized Gradient Descent for Adaptive Multi-loss Functions in EEG-based Motor Imagery Classification},
  year      = {2025},
  volume    = {},
  number    = {},
  pages     = {1-13},
  keywords  = {Brain-computer interfaces;multi-task learning;adaptive loss blending;motor imagery EEG;normalized gradient descent;},
  doi       = {10.1109/JBHI.2025.3572197}
}