Mostrar el registro sencillo del ítem

dc.contributor.authorDhahbi, Sami
dc.contributor.authorSaleem, Nasir
dc.contributor.authorBourouis, Sami
dc.contributor.authorMouhebeddine, Berrima
dc.contributor.authorVerdú, Elena
dc.date2025
dc.date.accessioned2025-09-17T09:34:41Z
dc.date.available2025-09-17T09:34:41Z
dc.identifier.citationSami Dhahbi, Nasir Saleem, Sami Bourouis, Mouhebeddine Berrima, Elena Verdú, End-to-end neural automatic speech recognition system for low resource languages, Egyptian Informatics Journal, Volume 29, 2025, 100615, ISSN 1110-8665, https://doi.org/10.1016/j.eij.2025.100615.es_ES
dc.identifier.issn2090-4754
dc.identifier.issn1110-8665
dc.identifier.urihttps://reunir.unir.net/handle/123456789/18204
dc.description.abstractThe rising popularity of end-to-end (E2E) automatic speech recognition (ASR) systems can be attributed to their ability to learn complex speech patterns directly from raw data, eliminating the need for intricate feature extraction pipelines and handcrafted language models. E2E-ASR systems have consistently outperformed traditional ASRs. However, training E2E-ASR systems for low-resource languages remains challenging due to the dependence on data from well-resourced languages. ASR is vital for promoting under-resourced languages, especially in developing human-to-human and human-to-machine communication systems. Using synthetic speech and data augmentation techniques can enhance E2E-ASR performance for low-resource languages, reducing word error rates (WERs) and character error rates (CERs). This study leverages a non-autoregressive neural text-to-speech (TTS) engine to generate high-quality speech, converting a series of phonemes into speech waveforms (mel-spectrograms). An on-the-fly data augmentation method is applied to these mel-spectrograms, treating them as images from which features are extracted to train a convolutional neural network (CNN) and a bidirectional long short-term memory (BLSTM)-based ASR. The E2E architecture of this system achieves optimal WER and CER performance. The proposed deep learning-based E2E-ASR, trained with synthetic speech and data augmentation, shows significant performance improvements, with a 20.75% reduction in WERs and a 10.34% reduction in CERs.es_ES
dc.language.isoenges_ES
dc.publisherEgyptian Informatics Journales_ES
dc.relation.ispartofseries;vol. 29
dc.relation.urihttps://www.sciencedirect.com/science/article/pii/S1110866525000088es_ES
dc.rightsopenAccesses_ES
dc.subjectE2E learninges_ES
dc.subjectspeech recognitiones_ES
dc.subjectdeep learninges_ES
dc.subjectlow-resource languagees_ES
dc.subjectdata augmentationes_ES
dc.subjectsynthetic speeches_ES
dc.titleEnd-to-end neural automatic speech recognition system for low resource languageses_ES
dc.typearticlees_ES
reunir.tag~OPUes_ES
dc.identifier.doihttps://doi.org/10.1016/j.eij.2025.10061


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem