A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network

Jha, Sudan; Dey, Anirban; Kumar, Raghvendra; Kumar-Solanki, Vijender

dc.contributor.author	Jha, Sudan
dc.contributor.author	Dey, Anirban
dc.contributor.author	Kumar, Raghvendra
dc.contributor.author	Kumar-Solanki, Vijender
dc.date	2019-06
dc.date.accessioned	2022-02-24T10:50:56Z
dc.date.available	2022-02-24T10:50:56Z
dc.identifier.issn	1989-1660
dc.identifier.uri	https://reunir.unir.net/handle/123456789/12505
dc.description.abstract	Visual Question Answering (VQA) is a stimulating process in the ﬁeld of Natural Language Processing (NLP) and Computer Vision (CV). In this process machine can find an answer to a natural language question which is related to an image. Question can be open-ended or multiple choice. Datasets of VQA contain mainly three components; questions, images and answers. Researchers overcome the VQA problem with deep learning based architecture that jointly combines both of two networks i.e. Convolution Neural Network (CNN) for visual (image) representation and Recurrent Neural Network (RNN) with Long Short Time Memory (LSTM) for textual (question) representation and trained the combined network end to end to generate the answer. Those models are able to answer the common and simple questions that are directly related to the image’s content. But different types of questions need different level of understanding to produce correct answers. To solve this problem, we use faster Region based-CNN (R-CNN) for extracting image features with an extra fully connected layer whose weights are dynamically obtained by LSTMs cell according to the question. We claim in this paper that a single R-CNN architecture can solve the problems related to VQA by modifying weights in the parameter prediction layer. Authors trained the network end to end by Stochastic Gradient Descent (SGD) using pretrained faster R-CNN and LSTM and tested it on benchmark datasets of VQA.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI)	es_ES
dc.relation.ispartofseries	;vol. 5, nº 5
dc.relation.uri	https://www.ijimai.org/journal/bibcite/reference/2688	es_ES
dc.rights	openAccess	es_ES
dc.subject	computer vision	es_ES
dc.subject	neural network	es_ES
dc.subject	natural language processing	es_ES
dc.subject	stochastic gradient descent.	es_ES
dc.subject	long short term memory	es_ES
dc.subject	IJIMAI	es_ES
dc.title	A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network	es_ES
dc.type	article	es_ES
reunir.tag	~IJIMAI	es_ES
dc.identifier.doi	http://doi.org/10.9781/ijimai.2018.08.004

Ficheros en el ítem

Nombre:: ijimai_5_5_4_pdf_36854.pdf
Tamaño:: 1.569Mb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

vol. 5, nº 5, june 2019

Mostrar el registro sencillo del ítem

A Novel Approach on Visual Question Answering by Parameter Prediction using Faster Region Based Convolutional Neural Network

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Comparative study on ant colony optimization (ACO) and K-Means clustering approaches for jobs scheduling and energy optimization model in Internet of Things (IoT) ﻿

Comparative Study on Ant Colony Optimization (ACO) and K-Means Clustering Approaches for Jobs Scheduling and Energy Optimization Model in Internet of Things (IoT) ﻿

Spiking Activity of a LIF Neuron in Distributed Delay Framework ﻿

Comparative study on ant colony optimization (ACO) and K-Means clustering approaches for jobs scheduling and energy optimization model in Internet of Things (IoT)

Comparative Study on Ant Colony Optimization (ACO) and K-Means Clustering Approaches for Jobs Scheduling and Energy Optimization Model in Internet of Things (IoT)

Spiking Activity of a LIF Neuron in Distributed Delay Framework