Alzheimer’s dementia (AD) entails negative psychological, social, and economic consequences not only for the patients but also for their families, relatives, and society in general. Despite the significance of this phenomenon and the importance for an early diagnosis, there are still limitations. Specifically, the main limitation is pertinent to the way the modalities of speech and transcripts are combined in a single neural network. Existing research works add/concatenate the image and text representations, employ majority voting approaches or average the predictions after training many textual and speech models separately. To address these limitations, in this article we present some new methods to detect AD patients and predict the Mini-Mental State Examination (MMSE) scores in an end-to-end trainable manner consisting of a combination of BERT, Vision Transformer, Co-Attention, Multimodal Shifting Gate, and a variant of the self-attention mechanism. Specifically, we convert audio to Log-Mel spectrograms, their delta, and delta-delta (acceleration values). First, we pass each transcript and image through a BERT model and Vision Transformer, respectively, adding a co-attention layer at the top, which generates image and word attention simultaneously. Secondly, we propose an architecture, which integrates multimodal information to a BERT model via a Multimodal Shifting Gate. Finally, we introduce an approach to capture both the inter- and intra-modal interactions by concatenating the textual and visual representations and utilizing a self-attention mechanism, which includes a gate model. Experiments conducted on the ADReSS Challenge dataset indicate that our introduced models demonstrate valuable advantages over existing research initiatives achieving competitive results in both the AD classification and MMSE regression task. Specifically, our best performing model attains an accuracy of 90.00% and a Root Mean Squared Error (RMSE) of 3.61 in the AD classification task and MMSE regression task, respectively, achieving a new state-of-the-art performance in the MMSE regression task.
Read Full Article (External Site)
Dr. David Lowemann, M.Sc, Ph.D., is a co-founder of the Institute for the Future of Human Potential, where he leads the charge in pioneering Self-Enhancement Science for the Success of Society. With a keen interest in exploring the untapped potential of the human mind, Dr. Lowemann has dedicated his career to pushing the boundaries of human capabilities and understanding.
Armed with a Master of Science degree and a Ph.D. in his field, Dr. Lowemann has consistently been at the forefront of research and innovation, delving into ways to optimize human performance, cognition, and overall well-being. His work at the Institute revolves around a profound commitment to harnessing cutting-edge science and technology to help individuals lead more fulfilling and intelligent lives.
Dr. Lowemann’s influence extends to the educational platform BetterSmarter.me, where he shares his insights, findings, and personal development strategies with a broader audience. His ongoing mission is shaping the way we perceive and leverage the vast capacities of the human mind, offering invaluable contributions to society’s overall success and collective well-being.