KTU Researchers Develop AI Model Combining Speech and Brain Activity for Depression Diagnosis
Depression: One of the Most Common Mental Health Disorders
Depression is one of the most prevalent mental health disorders, affecting approximately 280 million people globally. Researchers at Kaunas University of Technology (KTU) have developed an innovative artificial intelligence (AI) model to identify depression by analysing both speech and brain activity data.
This multimodal approach, which combines two distinct data sources, allows for more precise and objective analysis of emotional states, opening the door to a new era in depression diagnosis.
“Depression is one of the most widespread mental health disorders, with devastating effects on individuals and society. We are developing a new, more objective diagnostic method that could one day become widely accessible,” says KTU professor and co-author of the invention, Rytis Maskeliūnas.
Traditionally, most depression diagnostic studies have relied on a single type of data. However, researchers argue that this new method, which incorporates multiple data streams, provides a more comprehensive understanding of a person’s emotional state.
Exceptional Accuracy Using Speech and Brain Activity Data
The combination of speech and brain activity data in diagnosing depression achieved an impressive accuracy rate of 97.53%, far surpassing the results of alternative methods. “This is because speech contributes unique data that we cannot yet extract from brain activity measurements,” explains Prof. Maskeliūnas.
Musyyab Yousufi, a KTU doctoral student who contributed to the development, highlights the rationale behind the data selection: “While facial expressions might seem to reveal more about a person, we chose speech because its subtle changes can provide insights into emotional states. Depression affects speech in ways such as altering pace, intonation, and energy, whereas facial expressions are easier to control.”
The researchers also recognised the privacy concerns associated with facial recordings, particularly for patients with depression. Unlike facial data, which directly identifies individuals, brain activity (EEG) and speech data offer greater privacy protections.
“We must prioritise patient privacy. Moreover, the combination of EEG and speech data is more viable for future applications,” says a professor from KTU’s Faculty of Informatics.
The researchers emphasise that they are not medical professionals and cannot conduct direct patient studies. Instead, their data were sourced from the MODMA (Multimodal Open Dataset for Mental Disorder Analysis) database. EEG data were recorded over five minutes, with participants in a relaxed, awake state, eyes closed, and motionless.
In the speech-related experiments, participants engaged in activities such as question-and-answer sessions, reading, and describing images to capture their natural speech patterns and cognitive state.
Teaching AI to Explain Its Diagnoses
The collected EEG and speech signals were converted into spectrograms, enabling a visual representation of the data. Special noise-reduction filters were applied, and a modified DenseNet-121 deep learning model was used to detect signs of depression in the spectrograms. Each image represented changes in the signals over time: EEG data visualised brainwave activity, while audio data captured frequency and intensity variations.
The model included a customised classification layer trained to categorise individuals as either healthy or affected by depression. The program’s accuracy was determined based on its success in correctly classifying the data.
In the future, this AI model could significantly accelerate depression diagnosis, facilitate remote assessments, and reduce the risk of subjective interpretations. However, further clinical trials and system refinements are required, and Prof. Maskeliūnas acknowledges several challenges.
“The main obstacle in this type of research is the lack of available data, as people are often hesitant to share,” he explains.
Another key challenge, according to the KTU professor, is ensuring that the AI model not only provides accurate results but also explains the reasoning behind its conclusions. “The algorithm still needs to learn how to clearly explain its diagnoses,” he adds with a smile.
Prof. Maskeliūnas also points out that this issue is not limited to healthcare. Fields such as finance and law also face growing demand for AI applications where decisions directly impact people’s lives. This has led to the rise of explainable artificial intelligence (XAI), a technology designed to clarify why AI models make specific decisions, thereby increasing trust in their use.
The article “Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet-121” was published in the journal Brain Sciences and is available here.
Original source here.