Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks

Arronte Alvarez, Aitor; Gómez, Francisco

dc.contributor.author	Arronte Alvarez, Aitor
dc.contributor.author	Gómez, Francisco
dc.date	2021-06
dc.date.accessioned	2022-04-29T07:23:45Z
dc.date.available	2022-04-29T07:23:45Z
dc.identifier.issn	1989-1660
dc.identifier.uri	https://reunir.unir.net/handle/123456789/12975
dc.description.abstract	Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, originally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present a full end-to-end pipeline for audio music classification that includes a sequential pattern mining technique and a contour simplification method to extract relevant motifs from audio recordings. Mel-spectrograms of the extracted motifs are then used as the input for the different architectures tested. We investigate the usefulness of motivic patterns for the automatic classification of music recordings and the effect of the length of the audio and corpus size on the overall classification accuracy. Results show a relative accuracy improvement of up to 20.4% when CNN architectures are trained using acoustic representations from motivic patterns.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)	es_ES
dc.relation.ispartofseries	;vol. 6, nº 6
dc.relation.uri	https://www.ijimai.org/journal/bibcite/reference/2878	es_ES
dc.rights	openAccess	es_ES
dc.subject	motivic patterns	es_ES
dc.subject	convolutional neural network (CNN)	es_ES
dc.subject	data augmentation	es_ES
dc.subject	audio	es_ES
dc.subject	music	es_ES
dc.subject	information retrieval	es_ES
dc.subject	IJIMAI	es_ES
dc.title	Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks	es_ES
dc.type	article	es_ES
reunir.tag	~IJIMAI	es_ES
dc.identifier.doi	https://doi.org/10.9781/ijimai.2021.01.003

Ficheros en el ítem

Nombre:: ijimai_6_6_21.pdf
Tamaño:: 792.7Kb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

vol. 6, nº 6, june 2021

Mostrar el registro sencillo del ítem

Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Rhetorical Pattern Finding ﻿

I Congreso Español de Videojuegos 2022 ﻿

Early-Onset Dementia Associated with a Heterozygous, Nonsense, and de novo Variant in the MBD5 Gene ﻿

Rhetorical Pattern Finding

I Congreso Español de Videojuegos 2022

Early-Onset Dementia Associated with a Heterozygous, Nonsense, and de novo Variant in the MBD5 Gene