• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • RESULTADOS DE INVESTIGACIÓN
    • Otras Publicaciones: artículos, libros...
    • Ver ítem
    •   Inicio
    • RESULTADOS DE INVESTIGACIÓN
    • Otras Publicaciones: artículos, libros...
    • Ver ítem

    End-to-end neural automatic speech recognition system for low resource languages

    Autor: 
    Dhahbi, Sami
    ;
    Saleem, Nasir
    ;
    Bourouis, Sami
    ;
    Mouhebeddine, Berrima
    ;
    Verdú, Elena
    Fecha: 
    2025
    Palabra clave: 
    E2E learning; speech recognition; deep learning; low-resource language; data augmentation; synthetic speech
    Revista / editorial: 
    Egyptian Informatics Journal
    Citación: 
    Sami Dhahbi, Nasir Saleem, Sami Bourouis, Mouhebeddine Berrima, Elena Verdú, End-to-end neural automatic speech recognition system for low resource languages, Egyptian Informatics Journal, Volume 29, 2025, 100615, ISSN 1110-8665, https://doi.org/10.1016/j.eij.2025.100615.
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/18204
    DOI: 
    https://doi.org/10.1016/j.eij.2025.10061
    Dirección web: 
    https://www.sciencedirect.com/science/article/pii/S1110866525000088
    Open Access
    Resumen:
    The rising popularity of end-to-end (E2E) automatic speech recognition (ASR) systems can be attributed to their ability to learn complex speech patterns directly from raw data, eliminating the need for intricate feature extraction pipelines and handcrafted language models. E2E-ASR systems have consistently outperformed traditional ASRs. However, training E2E-ASR systems for low-resource languages remains challenging due to the dependence on data from well-resourced languages. ASR is vital for promoting under-resourced languages, especially in developing human-to-human and human-to-machine communication systems. Using synthetic speech and data augmentation techniques can enhance E2E-ASR performance for low-resource languages, reducing word error rates (WERs) and character error rates (CERs). This study leverages a non-autoregressive neural text-to-speech (TTS) engine to generate high-quality speech, converting a series of phonemes into speech waveforms (mel-spectrograms). An on-the-fly data augmentation method is applied to these mel-spectrograms, treating them as images from which features are extracted to train a convolutional neural network (CNN) and a bidirectional long short-term memory (BLSTM)-based ASR. The E2E architecture of this system achieves optimal WER and CER performance. The proposed deep learning-based E2E-ASR, trained with synthetic speech and data augmentation, shows significant performance improvements, with a 20.75% reduction in WERs and a 10.34% reduction in CERs.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: End-to-end neural automatic.pdf
    Tamaño: 3.282Mb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • Otras Publicaciones: artículos, libros...

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    2023
    2024
    2025
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    91
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    129

    Ítems relacionados

    Mostrando ítems relacionados por Título, autor o materia.

    • E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis 

      Saleem, Nasir; Gao, Jiechao; Irfan, Muhammad; Verdú, Elena ; Parra Puente, Javier (Image and vision computing, 2022)
      Speechreading which infers spoken message from a visually detected articulated facial trend is a challenging task. In this paper, we propose an end-to-end ResNet (E2E-ResNet) model for synthesizing speech signals from the ...
    • On improvement of speech intelligibility and quality: a survey of unsupervised single channel speech enhancement algorithms 

      Saleem, Nasir; Khattak, Muhammad Irfan; Verdú, Elena (International Journal of Interactive Multimedia and Artificial Intelligence, 06/2020)
      Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of ...
    • On Improvement of Speech Intelligibility and Quality: A Survey of Unsupervised Single Channel Speech Enhancement Algorithms 

      Saleem, Nasir; Khattak, Muhammad Irfan; Verdú, Elena (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2020)
      Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of ...

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioContacto: reunir@unir.net

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja