Music Boundary Detection using Convolutional Neural Networks: A Comparative Analysis of Combined Input Features

Hernandez-Olivan, Carlos; Beltran, Jose R.; Diaz-Guerra, David

dc.contributor.author	Hernandez-Olivan, Carlos
dc.contributor.author	Beltran, Jose R.
dc.contributor.author	Diaz-Guerra, David
dc.date	2021-12
dc.date.accessioned	2022-05-10T11:49:12Z
dc.date.available	2022-05-10T11:49:12Z
dc.identifier.issn	1989-1660
dc.identifier.uri	https://reunir.unir.net/handle/123456789/13058
dc.description.abstract	The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence, especially in the field of Deep Learning. It requires prior identification of the structural boundaries of the music pieces, whose structural boundary analysis has recently been studied with unsupervised methods and supervised neural networks trained with human annotations. The supervised neural networks that have been used in previous studies are Convolutional Neural Networks (CNN) that use Mel-Scaled Log-magnitude Spectograms features (MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as inputs. In previously published studies, pre-processing is done in different ways using different distance metrics, and different audio features are used for computing the inputs, so a generalised pre-processing method for calculating model inputs is missing. The objective of this work is to establish a general method to pre-process these inputs by comparing the results obtained by taking the inputs calculated from different pooling strategies, distance metrics and audio characteristics, also taking into account the computing time to obtain them. We also establish the most effective combination of inputs to be delivered to the CNN to provide the most efficient way to extract the boundaries of the structure of the music pieces. With an adequate combination of input matrices and pooling strategies, we obtain an accuracy F1 of 0.411 that outperforms a current work done under the same conditions (same public available dataset for training and testing).	es_ES
dc.language.iso	eng	es_ES
dc.publisher	International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)	es_ES
dc.relation.ispartofseries	;vol. 7, nº 2
dc.relation.uri	https://www.ijimai.org/journal/bibcite/reference/3030	es_ES
dc.rights	openAccess	es_ES
dc.subject	deep learning	es_ES
dc.subject	convolutional neural network (CNN)	es_ES
dc.subject	music	es_ES
dc.subject	information retrieval	es_ES
dc.subject	music information retrieval (MIR)	es_ES
dc.subject	self-similarity matrix (SSM)	es_ES
dc.subject	IJIMAI	es_ES
dc.title	Music Boundary Detection using Convolutional Neural Networks: A Comparative Analysis of Combined Input Features	es_ES
dc.type	article	es_ES
reunir.tag	~IJIMAI	es_ES
dc.identifier.doi	https://doi.org/10.9781/ijimai.2021.10.005

Ficheros en el ítem

Nombre:: ijimai7_2_8_0.pdf
Tamaño:: 1.204Mb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

vol. 7, nº 2, december 2021

Mostrar el registro sencillo del ítem

Music Boundary Detection using Convolutional Neural Networks: A Comparative Analysis of Combined Input Features

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Validity and Intra Rater Reliability of a New Device for Tongue Force Measurement ﻿

(Un)Broken: Lateral violence among hospital nurses, user violence, burnout, and general health: A structural equation modeling analysis ﻿

Radon Mitigation Applications at the Laboratorio Subterraneo de Canfranc (LSC) ﻿

Validity and Intra Rater Reliability of a New Device for Tongue Force Measurement

(Un)Broken: Lateral violence among hospital nurses, user violence, burnout, and general health: A structural equation modeling analysis

Radon Mitigation Applications at the Laboratorio Subterraneo de Canfranc (LSC)