Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings

Nosek, Tijana V.; Suzić, Siniša B.; Pekar, Darko J.; Obradović, Radovan J.; Sečujski, Milan S.; Delić, Vlado D.

dc.contributor.author	Nosek, Tijana V.
dc.contributor.author	Suzić, Siniša B.
dc.contributor.author	Pekar, Darko J.
dc.contributor.author	Obradović, Radovan J.
dc.contributor.author	Sečujski, Milan S.
dc.contributor.author	Delić, Vlado D.
dc.date	2021-12
dc.date.accessioned	2022-05-11T09:49:49Z
dc.date.available	2022-05-11T09:49:49Z
dc.identifier.issn	1989-1660
dc.identifier.uri	https://reunir.unir.net/handle/123456789/13070
dc.description.abstract	The paper presents a novel architecture and method for speech synthesis in multiple languages, in voices of multiple speakers and in multiple speaking styles, even in cases when speech from a particular speaker in the target language was not present in the training data. The method is based on the application of neural network embedding to combinations of speaker and style IDs, but also to phones in particular phonetic contexts, without any prior linguistic knowledge on their phonetic properties. This enables the network not only to efficiently capture similarities and differences between speakers and speaking styles, but to establish appropriate relationships between phones belonging to different languages, and ultimately to produce synthetic speech in the voice of a certain speaker in a language that he/she has never spoken. The validity of the proposed approach has been confirmed through experiments with models trained on speech corpora of American English and Mexican Spanish. It has also been shown that the proposed approach supports the use of neural vocoders, i.e. that they are able to produce synthesized speech of good quality even in languages that they were not trained on.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)	es_ES
dc.relation.ispartofseries	;vol. 7, nº 2
dc.relation.uri	https://www.ijimai.org/journal/bibcite/reference/3049	es_ES
dc.rights	openAccess	es_ES
dc.subject	cross-lingual	es_ES
dc.subject	artificial neural networks	es_ES
dc.subject	speech synthesis	es_ES
dc.subject	vocoder	es_ES
dc.subject	IJIMAI	es_ES
dc.title	Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings	es_ES
dc.type	article	es_ES
reunir.tag	~IJIMAI	es_ES
dc.identifier.doi	https://doi.org/10.9781/ijimai.2021.11.005