Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training

Casado-Vara, Roberto; Martin del Rey, Angel; Pérez-Palau, Daniel; de-la-Fuente-Valentín, Luis; Corchado, Juan M.

dc.contributor.author	Casado-Vara, Roberto
dc.contributor.author	Martin del Rey, Angel
dc.contributor.author	Pérez-Palau, Daniel
dc.contributor.author	de-la-Fuente-Valentín, Luis
dc.contributor.author	Corchado, Juan M.
dc.date	2021
dc.date.accessioned	2021-07-06T11:35:54Z
dc.date.available	2021-07-06T11:35:54Z
dc.identifier.issn	2227-7390
dc.identifier.uri	https://reunir.unir.net/handle/123456789/11560
dc.description.abstract	Evaluating web traffic on a web server is highly critical for web service providers since, without a proper demand forecast, customers could have lengthy waiting times and abandon that website. However, this is a challenging task since it requires making reliable predictions based on the arbitrary nature of human behavior. We introduce an architecture that collects source data and in a supervised way performs the forecasting of the time series of the page views. Based on the Wikipedia page views dataset proposed in a competition by Kaggle in 2017, we created an updated version of it for the years 2018–2020. This dataset is processed and the features and hidden patterns in data are obtained for later designing an advanced version of a recurrent neural network called Long Short-Term Memory. This AI model is distributed training, according to the paradigm called data parallelism and using the Downpour training strategy. Predictions made for the seven dominant languages in the dataset are accurate with loss function and measurement error in reasonable ranges. Despite the fact that the analyzed time series have fairly bad patterns of seasonality and trend, the predictions have been quite good, evidencing that an analysis of the hidden patterns and the features extraction before the design of the AI model enhances the model accuracy. In addition, the improvement of the accuracy of the model with the distributed training is remarkable. Since the task of predicting web traffic in as precise quantities as possible requires large datasets, we designed a forecasting system to be accurate despite having limited data in the dataset. We tested the proposed model on the new Wikipedia page views dataset we created and obtained a highly accurate prediction; actually, the mean absolute error of predictions regarding the original one on average is below 30. This represents a significant step forward in the field of time series prediction for web traffic forecasting.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Mathematics	es_ES
dc.relation.ispartofseries	;vol. 9, nº 4
dc.relation.uri	https://www.mdpi.com/2227-7390/9/4/421	es_ES
dc.rights	openAccess	es_ES
dc.subject	downpour strategy	es_ES
dc.subject	LSTM	es_ES
dc.subject	parameter averaging	es_ES
dc.subject	pattern extraction	es_ES
dc.subject	time series forecast	es_ES
dc.subject	web traffic forecast	es_ES
dc.subject	Scopus	es_ES
dc.subject	WOS(2)	es_ES
dc.title	Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training	es_ES
dc.type	Articulo Revista Indexada	es_ES
reunir.tag	~ARI	es_ES
dc.identifier.doi	https://doi.org/10.3390/math9040421

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos Científicos WOS y SCOPUS

Mostrar el registro sencillo del ítem

Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Semi-Automatic 3D Reconstruction of Atheroma Plaques from Intravascular Ultrasound Images Using an ad-hoc Algorithm ﻿

Learning Management Systems Activity Records for Students' Assessment of Generic Skills ﻿

Emerging Technologies Landscape on Education. A review ﻿

Semi-Automatic 3D Reconstruction of Atheroma Plaques from Intravascular Ultrasound Images Using an ad-hoc Algorithm

Learning Management Systems Activity Records for Students' Assessment of Generic Skills

Emerging Technologies Landscape on Education. A review