Improving Aphasic Communication Using Multimodal AI Systems

dc.contributor.authorIsabel Ferri-Molla
dc.contributor.authorJordi Linares-Pellicer
dc.contributor.authorJuan Izquierdo-Domenech
dc.date.accessioned2026-06-16T08:00:50Z
dc.date.issued2026-06-01
dc.description.abstractAphasia, often resulting from brain injuries, significantly impairs individuals’ language abilities, creating substantial challenges for verbal communication. Existing assistive technologies frequently fall short in addressing these specialised communication needs, underscoring the urgent demand for adaptive, intelligent support systems. This research proposes a dual approach: an Automatic Speech Recognition (ASR) module fine-tuned on aphasic speech, and a multimodal component that integrates visual context to infer the speaker’s intended meaning. The ASR system leverages fine-tuned versions of Whisper and Wav2Vec 2.0 on data from the AphasiaBank corpus. Results show a notable reduction in Word Error Rate (WER) when comparing base pre-trained ASR models with their finetuned versions, decreasing from 70.36% to 31.53% in a contextindependent setting, and from 61.25% to 35.60% in a speaker-independent evaluation, demonstrating robustness across different scenarios. In contrast to the ASR module, the goal of the multimodal component is not to produce a literal word-by-word transcription, but rather to reconstruct the speaker’s communicative intent using contextual information. To evaluate this capability, we conducted a human study assessing the system’s ability to interpret what the speaker truly meant. The results confirmed that outputs combining visual cues with language model reasoning more reliably captured communicative intent than audio-only transcriptions.
dc.identifier.citationImproving Aphasic Communication Using Multimodal AI Systems. (2026). International Journal of Interactive Multimedia and Artificial Intelligence, 9(7), 67-77. https://doi.org/10.9781/ijimai.2026.2215
dc.identifier.urihttps://reunir.unir.net/handle/123456789/19986
dc.language.isoen
dc.publisherInternational Journal of Interactive Multimedia and Artificial Intelligence
dc.relation.ispartofseriesVol. 9 No. 7
dc.subjectaphasia
dc.subjectASR
dc.subjectHCI
dc.subjectimage captioning
dc.subjectmultimodality
dc.titleImproving Aphasic Communication Using Multimodal AI Systems
dc.typeArticle

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Nombre:
Improving Aphasic Communication Using Multimodal.pdf
Tamaño:
432.75 KB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Nombre:
license.txt
Tamaño:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descripción: