• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2021
    • vol. 6, nº 6, june 2021
    • Ver ítem
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2021
    • vol. 6, nº 6, june 2021
    • Ver ítem

    A Word Embedding Based Approach for Focused Web Crawling Using the Recurrent Neural Network

    Autor: 
    Dhanith, P. R. Joe
    ;
    Surendiran, B.
    ;
    Raja, S. P.
    Fecha: 
    06/2021
    Palabra clave: 
    web crawlers; semantics; word embeddings; adagrad; recurrent network; IJIMAI
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/12958
    DOI: 
    https://doi.org/10.9781/ijimai.2020.09.003
    Dirección web: 
    https://www.ijimai.org/journal/bibcite/reference/2820
    Open Access
    Resumen:
    Learning-based focused crawlers download relevant uniform resource locators (URLs) from the web for a specific topic. Several studies have used the term frequency-inverse document frequency (TF-IDF) weighted cosine vector as an input feature vector for learning algorithms. TF-IDF-based crawlers calculate the relevance of a web page only if a topic word co-occurs on the said page, failing which it is considered irrelevant. Similarity is not considered even if a synonym of a term co-occurs on a web page. To resolve this challenge, this paper proposes a new methodology that integrates the Adagrad-optimized Skip Gram Negative Sampling (A-SGNS)-based word embedding and the Recurrent Neural Network (RNN).The cosine similarity is calculated from the word embedding matrix to form a feature vector that is given as an input to the RNN to predict the relevance of the website. The performance of the proposed method is evaluated using the harvest rate (hr) and irrelevance ratio (ir). The proposed methodology outperforms existing methodologies with an average harvest rate of 0.42 and irrelevance ratio of 0.58.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: ijimai_6_6_13.pdf
    Tamaño: 1.208Mb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • vol. 6, nº 6, june 2021

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    19
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    46

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioAutorización TFG-M

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja