Enhancing big data feature selection using a hybrid correlation-based feature selection

Mohamad, Masurah; Selamat, Ali; Krejcar, Ondrej; González-Crespo, Rubén; Herrera-Viedma, Enrique; Fujita, Hamido

dc.contributor.author	Mohamad, Masurah
dc.contributor.author	Selamat, Ali
dc.contributor.author	Krejcar, Ondrej
dc.contributor.author	González-Crespo, Rubén
dc.contributor.author	Herrera-Viedma, Enrique
dc.contributor.author	Fujita, Hamido
dc.date	2021
dc.date.accessioned	2022-06-03T08:32:01Z
dc.date.available	2022-06-03T08:32:01Z
dc.identifier.issn	2079-9292
dc.identifier.uri	https://reunir.unir.net/handle/123456789/13225
dc.description.abstract	This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.	es_ES
dc.language.iso	eng	es_ES
dc.relation.ispartofseries	;vol. 10, nº 23
dc.relation.uri	https://www.mdpi.com/2079-9292/10/23/2984	es_ES
dc.rights	openAccess	es_ES
dc.subject	big data	es_ES
dc.subject	correlation-based feature selection	es_ES
dc.subject	deep learning	es_ES
dc.subject	DRSA	es_ES
dc.subject	feature selection	es_ES
dc.subject	neural network	es_ES
dc.subject	support vector machines (SVM)	es_ES
dc.subject	Scopus	es_ES
dc.subject	JCR	es_ES
dc.title	Enhancing big data feature selection using a hybrid correlation-based feature selection	es_ES
dc.type	article	es_ES
reunir.tag	~ARI	es_ES
dc.identifier.doi	https://doi.org/10.3390/electronics10232984

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos Científicos WOS y SCOPUS

Mostrar el registro sencillo del ítem

Enhancing big data feature selection using a hybrid correlation-based feature selection

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Multilayer Framework for Botnet Detection Using Machine Learning Algorithms ﻿

Imputation of Rainfall Data Using the Sine Cosine Function Fitting Neural Network ﻿

Dealing with group decision-making environments that have a high amount of alternatives using card-sorting techniques ﻿

Multilayer Framework for Botnet Detection Using Machine Learning Algorithms

Imputation of Rainfall Data Using the Sine Cosine Function Fitting Neural Network

Dealing with group decision-making environments that have a high amount of alternatives using card-sorting techniques