Mostrar el registro sencillo del ítem

dc.contributor.authorDhanith, P. R. Joe
dc.contributor.authorSurendiran, B.
dc.contributor.authorRaja, S. P.
dc.date2021-06
dc.date.accessioned2022-04-28T07:29:30Z
dc.date.available2022-04-28T07:29:30Z
dc.identifier.issn1989-1660
dc.identifier.urihttps://reunir.unir.net/handle/123456789/12958
dc.description.abstractLearning-based focused crawlers download relevant uniform resource locators (URLs) from the web for a specific topic. Several studies have used the term frequency-inverse document frequency (TF-IDF) weighted cosine vector as an input feature vector for learning algorithms. TF-IDF-based crawlers calculate the relevance of a web page only if a topic word co-occurs on the said page, failing which it is considered irrelevant. Similarity is not considered even if a synonym of a term co-occurs on a web page. To resolve this challenge, this paper proposes a new methodology that integrates the Adagrad-optimized Skip Gram Negative Sampling (A-SGNS)-based word embedding and the Recurrent Neural Network (RNN).The cosine similarity is calculated from the word embedding matrix to form a feature vector that is given as an input to the RNN to predict the relevance of the website. The performance of the proposed method is evaluated using the harvest rate (hr) and irrelevance ratio (ir). The proposed methodology outperforms existing methodologies with an average harvest rate of 0.42 and irrelevance ratio of 0.58.es_ES
dc.language.isoenges_ES
dc.publisherInternational Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)es_ES
dc.relation.ispartofseries;vol. 6, nº 6
dc.relation.urihttps://www.ijimai.org/journal/bibcite/reference/2820es_ES
dc.rightsopenAccesses_ES
dc.subjectweb crawlerses_ES
dc.subjectsemanticses_ES
dc.subjectword embeddingses_ES
dc.subjectadagrades_ES
dc.subjectrecurrent networkes_ES
dc.subjectIJIMAIes_ES
dc.titleA Word Embedding Based Approach for Focused Web Crawling Using the Recurrent Neural Networkes_ES
dc.typearticlees_ES
reunir.tag~IJIMAIes_ES
dc.identifier.doihttps://doi.org/10.9781/ijimai.2020.09.003


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem