Modeling Sub-Band Information Through Discrete Wavelet Transform to Improve Intelligibility Assessment of Dysarthric Speech
Autor:
Sahu, Laxmi Priya
; Pradhan, Gayadhar
; Singh, Jyoti Prakash
Fecha:
12/2022Palabra clave:
Revista / editorial:
International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)Tipo de Ítem:
articleDirección web:
https://ijimai.org/journal/bibcite/reference/3196Resumen:
The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR- MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker- independent test case.
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(es)
Estadísticas de uso
Año |
2012 |
2013 |
2014 |
2015 |
2016 |
2017 |
2018 |
2019 |
2020 |
2021 |
2022 |
2023 |
2024 |
Vistas |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
13 |
73 |
66 |
Descargas |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
9 |
71 |
50 |
Ítems relacionados
Mostrando ítems relacionados por Título, autor o materia.
-
Infected Fruit Part Detection using K-Means Clustering Segmentation Technique
Dubey, Shiv Ram; Dixit, Pushkar; Singh, Nishant; Gupta, Jay Prakash (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2013)Nowadays, overseas commerce has increased drastically in many countries. Plenty fruits are imported from the other nations such as oranges, apples etc. Manual identification of defected fruit is very time consuming. This ... -
Robust Lossless Semi Fragile Information Protection in Images
Dixit, Pushkar; Singh, Nishant; Prakash Gupta, Jay (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2014)Internet security finds it difficult to keep the information secure and to maintain the integrity of the data. Sending messages over the internet secretly is one of the major tasks as it is widely used for passing the ... -
Analysis of Gait Pattern to Recognize the Human Activities
Prakash Gupta, Jay; Dixit, Pushkar; Singh, Nishant; Bhaskar Aemwal, Vijay (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 09/2014)Human activity recognition based on the computer vision is the process of labelling image sequences with action labels. Accurate systems for this problem are applied in areas such as visual surveillance, human computer ...