Improving Aphasic Communication Using Multimodal AI Systems

Isabel Ferri-Molla; Jordi Linares-Pellicer; Juan Izquierdo-Domenech

Improving Aphasic Communication Using Multimodal AI Systems

dc.contributor.author	Isabel Ferri-Molla
dc.contributor.author	Jordi Linares-Pellicer
dc.contributor.author	Juan Izquierdo-Domenech
dc.date.accessioned	2026-06-16T08:00:50Z
dc.date.issued	2026-06-01
dc.description.abstract	Aphasia, often resulting from brain injuries, significantly impairs individuals’ language abilities, creating substantial challenges for verbal communication. Existing assistive technologies frequently fall short in addressing these specialised communication needs, underscoring the urgent demand for adaptive, intelligent support systems. This research proposes a dual approach: an Automatic Speech Recognition (ASR) module fine-tuned on aphasic speech, and a multimodal component that integrates visual context to infer the speaker’s intended meaning. The ASR system leverages fine-tuned versions of Whisper and Wav2Vec 2.0 on data from the AphasiaBank corpus. Results show a notable reduction in Word Error Rate (WER) when comparing base pre-trained ASR models with their finetuned versions, decreasing from 70.36% to 31.53% in a contextindependent setting, and from 61.25% to 35.60% in a speaker-independent evaluation, demonstrating robustness across different scenarios. In contrast to the ASR module, the goal of the multimodal component is not to produce a literal word-by-word transcription, but rather to reconstruct the speaker’s communicative intent using contextual information. To evaluate this capability, we conducted a human study assessing the system’s ability to interpret what the speaker truly meant. The results confirmed that outputs combining visual cues with language model reasoning more reliably captured communicative intent than audio-only transcriptions.
dc.identifier.citation	Improving Aphasic Communication Using Multimodal AI Systems. (2026). International Journal of Interactive Multimedia and Artificial Intelligence, 9(7), 67-77. https://doi.org/10.9781/ijimai.2026.2215
dc.identifier.uri	https://reunir.unir.net/handle/123456789/19986
dc.language.iso	en
dc.publisher	International Journal of Interactive Multimedia and Artificial Intelligence
dc.relation.ispartofseries	Vol. 9 No. 7
dc.subject	aphasia
dc.subject	ASR
dc.subject	HCI
dc.subject	image captioning
dc.subject	multimodality
dc.title	Improving Aphasic Communication Using Multimodal AI Systems
dc.type	Article

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Improving Aphasic Communication Using Multimodal.pdf
Tamaño:: 432.75 KB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 1.71 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

vol. 9, nº 7, june 2026