E2E-V2SResNet: Deep residual convolutional neural networks for end-to-end video driven speech synthesis

Saleem, Nasir; Gao, Jiechao; Irfan, Muhammad; Verdú, Elena; Parra Puente, Javier

Autor:

Saleem, Nasir

;

Gao, Jiechao

;

Irfan, Muhammad

;

Verdú, Elena

;

Parra Puente, Javier

Fecha:

2022

Palabra clave:

video processing; E2E speech synthesis; ResNet-18; residual CNN; waveform CRITIC; JCR; Scopus

Revista / editorial:

Image and vision computing

Tipo de Ítem:

Articulo Revista Indexada

Resumen:

Speechreading which infers spoken message from a visually detected articulated facial trend is a challenging task. In this paper, we propose an end-to-end ResNet (E2E-ResNet) model for synthesizing speech signals from the silent video of a speaking individual. The model is the convolutional encoder-decoder framework which captures the frames of video and encodes into a latent space of visual features. The outputs of the decoder are spectrograms which are converted into waveforms corresponding to a speech articulated in the input video. The speech waveforms are then fed to a waveform critic used to decide the real or synthesized speech. The experiments show that the proposed E2E-V2SResNet model is apt to synthesize speech with realism and intelligibility/quality for GRID database. To further demonstrate the potentials of the proposed model, we also conduct experiments on the TCD-TIMIT database. We examine the synthesized speech in unseen speakers using three objective metrics use to measure the intelligibility, quality, and word error rate (WER) of the synthesized speech. We show that E2E-V2SResNet model outscores the competing approaches in most metrics on the GRID and TCD-TIMIT databases. By comparing with the baseline, the proposed model achieved 3.077% improvement in speech quality and 2.593% improvement in speech intelligibility. (c) 2022 Elsevier B.V. All rights reserved.

Mostrar el registro completo del ítem

See more details

Este ítem aparece en la(s) siguiente(s) colección(es)

Artículos Científicos WOS y SCOPUS

Año
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025

Vistas
0
0
0
0
0
0
0
0
0
0
12
51
115
45
223

Descargas
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0