A Word Embedding Based Approach for Focused Web Crawling Using the Recurrent Neural Network
Autor:
Dhanith, P. R. Joe
; Surendiran, B.
; Raja, S. P.
Fecha:
06/2021Palabra clave:
Revista / editorial:
International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)Tipo de Ítem:
articleDirección web:
https://www.ijimai.org/journal/bibcite/reference/2820Resumen:
Learning-based focused crawlers download relevant uniform resource locators (URLs) from the web for a specific topic. Several studies have used the term frequency-inverse document frequency (TF-IDF) weighted cosine vector as an input feature vector for learning algorithms. TF-IDF-based crawlers calculate the relevance of a web page only if a topic word co-occurs on the said page, failing which it is considered irrelevant. Similarity is not considered even if a synonym of a term co-occurs on a web page. To resolve this challenge, this paper proposes a new methodology that integrates the Adagrad-optimized Skip Gram Negative Sampling (A-SGNS)-based word embedding and the Recurrent Neural Network (RNN).The cosine similarity is calculated from the word embedding matrix to form a feature vector that is given as an input to the RNN to predict the relevance of the website. The performance of the proposed method is evaluated using the harvest rate (hr) and irrelevance ratio (ir). The proposed methodology outperforms existing methodologies with an average harvest rate of 0.42 and irrelevance ratio of 0.58.
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(es)
Estadísticas de uso
Año |
2012 |
2013 |
2014 |
2015 |
2016 |
2017 |
2018 |
2019 |
2020 |
2021 |
2022 |
2023 |
2024 |
Vistas |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
100 |
164 |
249 |
Descargas |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
288 |
303 |
116 |
Ítems relacionados
Mostrando ítems relacionados por Título, autor o materia.
-
Deep Feature Representation and Similarity Matrix based Noise Label Refinement Method for Efficient Face Annotation
Suruliandi, A.; Kasthuri, A.; Raja, S. P. (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 12/2021)Face annotation is a naming procedure that assigns the correct name to a person emerging from an image. Faces that are manually annotated by people in online applications include incorrect labels, giving rise to the issue ... -
Libertarias: el discurso hedonista de Vicente Aranda
Berenguer Ubeda, Jorge; Rajas, Mario; Miranda, Francisco Javier (Área Abierta, 05/2019)Este artículo aborda la poética de Vicente Aranda en su filme Libertarias (1996). Desde la perspectiva metodológica del análisis textual fílmico, se estudia el discurso narrativo y la construcción formal de la obra. Las ... -
HDDSS: An Enhanced Heart Disease Decision Support System using RFE-ABGNB Algorithm
Dhilsath Fathima, M.; Justin Samuel, S.; Raja, S. P. (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2023)Heart disease is the leading cause of mortality globally. Heart disease refers to a range of disorders that affect the heart and blood vessels. The risks of developing heart disease become minimized if heart disease is ...