A new multi-label dataset for Web attacks CAPEC classification using machine learning techniques

Sureda Riera, Tomás; Bermejo Higuera, Juan Ramón; Bermejo-Higuera, Javier; Martínez Herraiz, José-Javier; Sicilia, Juan Antonio

dc.contributor.author	Sureda Riera, Tomás
dc.contributor.author	Bermejo Higuera, Juan Ramón
dc.contributor.author	Bermejo-Higuera, Javier
dc.contributor.author	Martínez Herraiz, José-Javier
dc.contributor.author	Sicilia, Juan Antonio
dc.date	2022
dc.date.accessioned	2023-01-24T12:32:40Z
dc.date.available	2023-01-24T12:32:40Z
dc.identifier.citation	Riera, T. S., Higuera, J. R. B., Higuera, J. B., Herraiz, J. J. M., & Montalvo, J. A. S. (2022). A new multi-label dataset for Web attacks CAPEC classification using machine learning techniques. Computers & Security, 120, 102788
dc.identifier.issn	0167-4048
dc.identifier.uri	https://reunir.unir.net/handle/123456789/14058
dc.description.abstract	Context: There are many datasets for training and evaluating models to detect web attacks, labeling each request as normal or attack. Web attack protection tools must provide additional information on the type of attack detected, in a clear and simple way. Objectives: This paper presents a new multi-label dataset for classifying web attacks based on CAPEC classification, a new way of features extraction based on ASCII values, and the evaluation of several combinations of models and algorithms. Methods: Using a new way to extract features by computing the average of the sum of the ASCII values of each of the characters in each field that compose a web request, several combinations of algorithms (LightGBM and CatBoost) and multi-label classification models are evaluated, to provide a complete CAPEC classification of the web attacks that a system is suffering. The training and test data used for training and evaluating the models come from the new SR-BH 2020 multi-label dataset. Results: Calculating the average of the sum of the ASCII values of the different characters that make up a web request shows its usefulness for numeric encoding and feature extraction. The new SR-BH 2020 multi-label dataset allows the training and evaluation of multi-label classification models, also allowing the CAPEC classification of the various attacks that a web system is undergoing. The combination of the two-phase model with the MultiOutputClassifier module of the scikit-learn library, together with the CatBoost algorithm shows its superiority in classifying attacks in the different criticality scenarios. Conclusion: Experimental results indicate that the combination of machine learning algorithms and multi-phase models leads to improved prediction of web attacks. Also, the use of a multi-label dataset is suitable for training learning models that provide information about the type of attack. (c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Computers & Security	es_ES
dc.relation.ispartofseries	;vol. 120
dc.relation.uri	https://www.sciencedirect.com/science/article/pii/S0167404822001833?via%3Dihub	es_ES
dc.rights	openAccess	es_ES
dc.subject	multi-label classification	es_ES
dc.subject	dataset	es_ES
dc.subject	LightGBM	es_ES
dc.subject	CatBoost	es_ES
dc.subject	machine learning	es_ES
dc.subject	JCR	es_ES
dc.subject	Scopus	es_ES
dc.title	A new multi-label dataset for Web attacks CAPEC classification using machine learning techniques	es_ES
dc.type	Articulo Revista Indexada	es_ES
reunir.tag	~ARI	es_ES
dc.identifier.doi	https://doi.org/10.1016/j.cose.2022.102788

Ficheros en el ítem

Nombre:: new_multi-label_dataset.pdf
Tamaño:: 3.011Mb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos Científicos WOS y SCOPUS

Mostrar el registro sencillo del ítem

A new multi-label dataset for Web attacks CAPEC classification using machine learning techniques

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Prevention and fighting against web attacks through anomaly detection technology. A systematic review ﻿

Systematic Approach for Web Protection Runtime Tools’ Effectiveness Analysis ﻿

Combinatorial method with static analysis for source code security in web applications ﻿

Prevention and fighting against web attacks through anomaly detection technology. A systematic review

Systematic Approach for Web Protection Runtime Tools’ Effectiveness Analysis

Combinatorial method with static analysis for source code security in web applications