• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2024
    • vol. 8, nº 6, june 2024
    • Ver ítem
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2024
    • vol. 8, nº 6, june 2024
    • Ver ítem

    Lightweight Real-Time Recurrent Models for Speech Enhancement and Automatic Speech Recognition

    Autor: 
    Dhahbi, Sami
    ;
    Saleem, Nasir
    ;
    Gunawan, Teddy Surya
    ;
    Bourouis, Sami
    ;
    Ali, Imad
    ;
    Trigui, Aymen
    ;
    Algarni, Abeer D.
    Fecha: 
    06/2024
    Palabra clave: 
    real-time speech; simple recurrent unit (SRU); speech enhancement; speech processing; speech quality
    Revista / editorial: 
    International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)
    Citación: 
    S. Dhahbi, N. Saleem, T. S. Gunawan, S. Bourouis, I. Ali, A. Trigui, A. D. Algarni. Lightweight Real-Time Recurrent Models for Speech Enhancement and Automatic Speech Recognition, International Journal of Interactive Multimedia and Artificial Intelligence, (2024), http://dx.doi.org/10.9781/ijimai.2024.04.003
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/16570
    DOI: 
    http://dx.doi.org/10.9781/ijimai.2024.04.003
    Open Access
    Resumen:
    Traditional recurrent neural networks (RNNs) encounter difficulty in capturing long-term temporal dependencies. However, lightweight recurrent models for speech enhancement are important to improve noisy speech, while being computationally efficient and able to capture long-term temporal dependencies efficiently. This study proposes a lightweight hourglass-shaped model for speech enhancement (SE) and automatic speech recognition (ASR). Simple recurrent units (SRU) with skip connections are implemented where attention gates are added to the skip connections, highlighting the important features and spectral regions. The model operates without relying on future information that is well-suited for real-time processing. Combined acoustic features and two training objectives are estimated. Experimental evaluations using the short time speech intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and word error rates (WERs) indicate better intelligibility, perceptual quality, and word recognition rates. The composite measures further confirm the performance of residual noise and speech distortion. With the TIMIT database, the proposed model improves the STOI and PESQ by 16.21% and 0.69 (31.1%) whereas with the LibriSpeech database, the model improves STOI by 16.41% and PESQ by 0.71 (32.9%) over the noisy speech. Further, our model outperforms other deep neural networks (DNNs) in seen and unseen conditions. The ASR performance is measured using the Kaldi toolkit and achieves 15.13% WERs in noisy backgrounds.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: Lightweight Real-Time Recurrent Models for Speech Enhancement and Automatic Speech Recognition.pdf
    Tamaño: 3.334Mb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • vol. 8, nº 6, june 2024

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    2023
    2024
    2025
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    229
    173
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    190
    490

    Ítems relacionados

    Mostrando ítems relacionados por Título, autor o materia.

    • Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement 

      Fazal-E -Wahab; Ye, Zhongfu; Saleem, Nasir; Ali, Hamza (International Journal of Interactive Multimedia and Artificial Intelligence, 2024)
      Deep learning (DL) networks have grown into powerful alternatives for speech enhancement and have achieved excellent results by improving speech quality, intelligibility, and background noise suppression. Due to high ...
    • E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis 

      Saleem, Nasir; Gao, Jiechao; Irfan, Muhammad; Verdú, Elena ; Parra Puente, Javier (Image and vision computing, 2022)
      Speechreading which infers spoken message from a visually detected articulated facial trend is a challenging task. In this paper, we propose an end-to-end ResNet (E2E-ResNet) model for synthesizing speech signals from the ...
    • On improvement of speech intelligibility and quality: a survey of unsupervised single channel speech enhancement algorithms 

      Saleem, Nasir; Khattak, Muhammad Irfan; Verdú, Elena (International Journal of Interactive Multimedia and Artificial Intelligence, 06/2020)
      Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of ...

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioContacto: reunir@unir.net

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja