• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2021
    • vol. 7, nº 2, december 2021
    • Ver ítem
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2021
    • vol. 7, nº 2, december 2021
    • Ver ítem

    Audio-Visual Automatic Speech Recognition Using PZM, MFCC and Statistical Analysis

    Autor: 
    Debnath, Saswati
    ;
    Roy, Pinki
    Fecha: 
    12/2021
    Palabra clave: 
    audio-visual speech recognition; lip tracking; pseudo zernike moment; mel frequency cepstral; coefficients (MFCC); incremental feature selection (IFS); statistical analysis; IJIMAI
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/13055
    DOI: 
    https://doi.org/10.9781/ijimai.2021.09.001
    Dirección web: 
    https://www.ijimai.org/journal/bibcite/reference/3012
    Open Access
    Resumen:
    Audio-Visual Automatic Speech Recognition (AV-ASR) has become the most promising research area when the audio signal gets corrupted by noise. The main objective of this paper is to select the important and discriminative audio and visual speech features to recognize audio-visual speech. This paper proposes Pseudo Zernike Moment (PZM) and feature selection method for audio-visual speech recognition. Visual information is captured from the lip contour and computes the moments for lip reading. We have extracted 19th order of Mel Frequency Cepstral Coefficients (MFCC) as speech features from audio. Since all the 19 speech features are not equally important, therefore, feature selection algorithms are used to select the most efficient features. The various statistical algorithm such as Analysis of Variance (ANOVA), Kruskal-wallis, and Friedman test are employed to analyze the significance of features along with Incremental Feature Selection (IFS) technique. Statistical analysis is used to analyze the statistical significance of the speech features and after that IFS is used to select the speech feature subset. Furthermore, multiclass Support Vector Machine (SVM), Artificial Neural Network (ANN) and Naive Bayes (NB) machine learning techniques are used to recognize the speech for both the audio and visual modalities. Based on the recognition rate combined decision is taken from the two individual recognition systems. This paper compares the result achieved by the proposed model and the existing model for both audio and visual speech recognition. Zernike Moment (ZM) is compared with PZM and shows that our proposed model using PZM extracts better discriminative features for visual speech recognition. This study also proves that audio feature selection using statistical analysis outperforms methods without any feature selection technique.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: ijimai7_2_11_0.pdf
    Tamaño: 766.4Kb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • vol. 7, nº 2, december 2021

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    15
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    22

    Ítems relacionados

    Mostrando ítems relacionados por Título, autor o materia.

    • PCHET: An efficient programmable cellular automata based hybrid encryption technique for multi-chat client-server applications 

      Roy, Satyabrata; Gupta, Rohit Kumar; Rawat, Umashankar; Dey, Nilanjan; González-Crespo, Rubén (1) (Journal of Information Security and Applications, 12/2020)
      This paper demonstrates an efficient programmable Cellular Automata (CA) based hybrid encryption technique (PCHET) for chatting applications involving multiple clients who can chat simultaneously with each other. The ...
    • The Impact of COVID-19 Management Policies Tailored to Airborne SARS-CoV-2 Transmission: Policy Analysis 

      Telles, Charles Roberto; Roy, Archisman; Ajmal, Mohammad Rehan; Mustafa, Syed Khalid; Ahmad, Mohammad Ayaz; De la Serna Tuya, Juan Moisés (1) (JMIR public health and surveillance, 2021)
      Background: Daily new COVID-19 cases from January to April 2020 demonstrate varying patterns of SARS-CoV-2 transmission across different geographical regions. Constant infection rates were observed in some countries, whereas ...
    • The Impact of COVID-19 Management Policies Tailored to Airborne SARS-CoV-2 Transmission: Policy Analysis (vol 7, e20699, 2021) 

      Telles, Charles Roberto; Roy, Archisman Archisman; Ajmal, Mohammad Rehan; Mustafa, Syed Khalid; Ahmad, Mohammad Ayaz; De la Serna Tuya, Juan Moisés (1); Frigo, Elisandro Pires; Hernández Rosales, Manuel (JMIR public health and surveillance, 2021)
      In “The Impact of COVID-19 Management Policies Tailored to Airborne SARS-CoV-2 Transmission: Policy Analysis” (JMIR Public Health Surveill 2021;7(4):e20699), the authors noted one error. The academic degree of author ...

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioAutorización TFG-M

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja