• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2020
    • vol. 6, nº 1, march 2020
    • Ver ítem
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2020
    • vol. 6, nº 1, march 2020
    • Ver ítem

    Deep Neural Networks for Speech Enhancement in Complex-Noisy Environments

    Autor: 
    Saleem, Nasir
    ;
    Khattak, Muhammad Irfan
    Fecha: 
    03/2020
    Palabra clave: 
    speech enhancement; deep learning; intelligibility; time- frequency masking; ideal binary mask; IJIMAI
    Revista / editorial: 
    International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/12693
    DOI: 
    https://doi.org/10.9781/ijimai.2019.06.001
    Dirección web: 
    https://www.ijimai.org/journal/bibcite/reference/2725
    Open Access
    Resumen:
    In this paper, we considered the problem of the speech enhancement similar to the real-world environments where several complex noise sources simultaneously degrade the quality and intelligibility of a target speech. The existing literature on the speech enhancement principally focuses on the presence of one noise source in mixture signals. However, in real-world situations, we generally face and attempt to improve the quality and intelligibility of speech where various complex stationary and nonstationary noise sources are simultaneously mixed with the target speech. Here, we have used deep learning for speech enhancement in complex-noisy environments and used ideal binary mask (IBM) as a binary classification function by using deep neural networks (DNNs). IBM is used as a target function during training and the trained DNNs are used to estimate IBM during enhancement stage. The estimated target function is then applied to the complex-noisy mixtures to obtain the target speech. The mean square error (MSE) is used as an objective cost function at various epochs. The experimental results at different input signal-to-noise ratio (SNR) showed that DNN-based complex-noisy speech enhancement outperformed the competing methods in terms of speech quality by using perceptual evaluation of speech quality (PESQ), segmental signal-to-noise ratio (SNRSeg), log-likelihood ratio (LLR), weighted spectral slope (WSS). Moreover, short-time objective intelligibility (STOI) reinforced the better speech intelligibility.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: ijimai20206_1_10_pdf_20073.pdf
    Tamaño: 1.295Mb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • vol. 6, nº 1, march 2020

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    2023
    2024
    2025
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    124
    236
    243
    156
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    182
    228
    170
    95

    Ítems relacionados

    Mostrando ítems relacionados por Título, autor o materia.

    • On improvement of speech intelligibility and quality: a survey of unsupervised single channel speech enhancement algorithms 

      Saleem, Nasir; Khattak, Muhammad Irfan; Verdú, Elena (International Journal of Interactive Multimedia and Artificial Intelligence, 06/2020)
      Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of ...
    • On Improvement of Speech Intelligibility and Quality: A Survey of Unsupervised Single Channel Speech Enhancement Algorithms 

      Saleem, Nasir; Khattak, Muhammad Irfan; Verdú, Elena (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2020)
      Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of ...
    • Automated Detection of COVID-19 using Chest X-Ray Images and CT Scans through Multilayer-Spatial Convolutional Neural Networks 

      Khattak, Muhammad Irfan; Al-Hasan, Mu'ath; Jan, Atif; Saleem, Nasir; Verdú, Elena ; Khurshid, Numan (International Journal of Interactive Multimedia and Artificial Intelligence, 2021)
      The novel coronavirus-2019 (Covid-19), a contagious disease became a pandemic and has caused overwhelming effects on the human lives and world economy. The detection of the contagious disease is vital to avert further ...

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioContacto: reunir@unir.net

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja