• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2024
    • vol. 9, nº 1, diciembre 2024
    • Ver ítem
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2024
    • vol. 9, nº 1, diciembre 2024
    • Ver ítem

    Efficient Gated Convolutional Recurrent Neural Networks for Real-Time Speech Enhancement

    Autor: 
    Fazal-E -Wahab
    ;
    Ye, Zhongfu
    ;
    Saleem, Nasir
    ;
    Ali, Hamza
    Fecha: 
    2024
    Palabra clave: 
    Convolutional Gated Recurrent Unit (Convolutional GRU); deep learning; intelligibility; Long Short Term Memory (LSTM); speech enhancement; IJIMAI
    Revista / editorial: 
    International Journal of Interactive Multimedia and Artificial Intelligence
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/14813
    DOI: 
    https://doi.org/10.9781/ijimai.2023.05.007
    Dirección web: 
    https://www.ijimai.org/journal/bibcite/reference/3324
    Open Access
    Resumen:
    Deep learning (DL) networks have grown into powerful alternatives for speech enhancement and have achieved excellent results by improving speech quality, intelligibility, and background noise suppression. Due to high computational load, most of the DL models for speech enhancement are difficult to implement for realtime processing. It is challenging to formulate resource efficient and compact networks. In order to address this problem, we propose a resource efficient convolutional recurrent network to learn the complex ratio mask for real-time speech enhancement. Convolutional encoder-decoder and gated recurrent units (GRUs) are integrated into the Convolutional recurrent network architecture, thereby formulating a causal system appropriate for real-time speech processing. Parallel GRU grouping and efficient skipped connection techniques are engaged to achieve a compact network. In the proposed network, the causal encoder-decoder is composed of five convolutional (Conv2D) and deconvolutional (Deconv2D) layers. Leaky linear rectified unit (ReLU) is applied to all layers apart from the output layer where softplus activation to confine the network output to positive is utilized. Furthermore, batch normalization is adopted after every convolution (or deconvolution) and prior to activation. In the proposed network, different noise types and speakers can be used in training and testing. With the LibriSpeech dataset, the experiments show that the proposed real-time approach leads to improved objective perceptual quality and intelligibility with much fewer trainable parameters than existing LSTM and GRU models. The proposed model obtained an average of 83.53% STOI scores and 2.52 PESQ scores, respectively. The quality and intelligibility are improved by 31.61% and 17.18% respectively over noisy speech.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: ijimai_9_1_6.pdf
    Tamaño: 585.6Kb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • vol. 9, nº 1, diciembre 2024

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    2023
    2024
    2025
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    138
    198
    167
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    161
    185
    159

    Ítems relacionados

    Mostrando ítems relacionados por Título, autor o materia.

    • Lightweight Real-Time Recurrent Models for Speech Enhancement and Automatic Speech Recognition 

      Dhahbi, Sami; Saleem, Nasir; Gunawan, Teddy Surya; Bourouis, Sami; Ali, Imad; Trigui, Aymen; Algarni, Abeer D. (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2024)
      Traditional recurrent neural networks (RNNs) encounter difficulty in capturing long-term temporal dependencies. However, lightweight recurrent models for speech enhancement are important to improve noisy speech, while being ...
    • E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis 

      Saleem, Nasir; Gao, Jiechao; Irfan, Muhammad; Verdú, Elena ; Parra Puente, Javier (Image and vision computing, 2022)
      Speechreading which infers spoken message from a visually detected articulated facial trend is a challenging task. In this paper, we propose an end-to-end ResNet (E2E-ResNet) model for synthesizing speech signals from the ...
    • On improvement of speech intelligibility and quality: a survey of unsupervised single channel speech enhancement algorithms 

      Saleem, Nasir; Khattak, Muhammad Irfan; Verdú, Elena (International Journal of Interactive Multimedia and Artificial Intelligence, 06/2020)
      Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of ...

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioContacto: reunir@unir.net

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja