• Mi Re-Unir
    Búsqueda Avanzada
    JavaScript is disabled for your browser. Some features of this site may not work without it.
    Ver ítem 
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2021
    • vol. 6, nº 6, june 2021
    • Ver ítem
    •   Inicio
    • UNIR REVISTAS
    • Revista IJIMAI
    • 2021
    • vol. 6, nº 6, june 2021
    • Ver ítem

    A Word Embedding Based Approach for Focused Web Crawling Using the Recurrent Neural Network

    Autor: 
    Dhanith, P. R. Joe
    ;
    Surendiran, B.
    ;
    Raja, S. P.
    Fecha: 
    06/2021
    Palabra clave: 
    web crawlers; semantics; word embeddings; adagrad; recurrent network; IJIMAI
    Revista / editorial: 
    International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)
    Tipo de Ítem: 
    article
    URI: 
    https://reunir.unir.net/handle/123456789/12958
    DOI: 
    https://doi.org/10.9781/ijimai.2020.09.003
    Dirección web: 
    https://www.ijimai.org/journal/bibcite/reference/2820
    Open Access
    Resumen:
    Learning-based focused crawlers download relevant uniform resource locators (URLs) from the web for a specific topic. Several studies have used the term frequency-inverse document frequency (TF-IDF) weighted cosine vector as an input feature vector for learning algorithms. TF-IDF-based crawlers calculate the relevance of a web page only if a topic word co-occurs on the said page, failing which it is considered irrelevant. Similarity is not considered even if a synonym of a term co-occurs on a web page. To resolve this challenge, this paper proposes a new methodology that integrates the Adagrad-optimized Skip Gram Negative Sampling (A-SGNS)-based word embedding and the Recurrent Neural Network (RNN).The cosine similarity is calculated from the word embedding matrix to form a feature vector that is given as an input to the RNN to predict the relevance of the website. The performance of the proposed method is evaluated using the harvest rate (hr) and irrelevance ratio (ir). The proposed methodology outperforms existing methodologies with an average harvest rate of 0.42 and irrelevance ratio of 0.58.
    Mostrar el registro completo del ítem
    Ficheros en el ítem
    icon
    Nombre: ijimai_6_6_13.pdf
    Tamaño: 1.208Mb
    Formato: application/pdf
    Ver/Abrir
    Este ítem aparece en la(s) siguiente(s) colección(es)
    • vol. 6, nº 6, june 2021

    Estadísticas de uso

    Año
    2012
    2013
    2014
    2015
    2016
    2017
    2018
    2019
    2020
    2021
    2022
    2023
    2024
    2025
    Vistas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    100
    164
    251
    179
    Descargas
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    288
    303
    120
    121

    Ítems relacionados

    Mostrando ítems relacionados por Título, autor o materia.

    • Deep Feature Representation and Similarity Matrix based Noise Label Refinement Method for Efficient Face Annotation 

      Suruliandi, A.; Kasthuri, A.; Raja, S. P. (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 12/2021)
      Face annotation is a naming procedure that assigns the correct name to a person emerging from an image. Faces that are manually annotated by people in online applications include incorrect labels, giving rise to the issue ...
    • Libertarias: el discurso hedonista de Vicente Aranda 

      Berenguer Ubeda, Jorge; Rajas, Mario; Miranda, Francisco Javier (Área Abierta, 05/2019)
      Este artículo aborda la poética de Vicente Aranda en su filme Libertarias (1996). Desde la perspectiva metodológica del análisis textual fílmico, se estudia el discurso narrativo y la construcción formal de la obra. Las ...
    • HDDSS: An Enhanced Heart Disease Decision Support System using RFE-ABGNB Algorithm 

      Dhilsath Fathima, M.; Justin Samuel, S.; Raja, S. P. (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2023)
      Heart disease is the leading cause of mortality globally. Heart disease refers to a range of disorders that affect the heart and blood vessels. The risks of developing heart disease become minimized if heart disease is ...

    Mi cuenta

    AccederRegistrar

    ¿necesitas ayuda?

    Manual de UsuarioContacto: reunir@unir.net

    Listar

    todo Re-UnirComunidades y coleccionesPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de accesoEsta colecciónPor fecha de publicaciónAutoresTítulosPalabras claveTipo documentoTipo de acceso






    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja
     
    Aviso Legal Política de Privacidad Política de Cookies Cláusulas legales RGPD
    © UNIR - Universidad Internacional de La Rioja