Improved Fine-Tuned Reinforcement Learning From Human Feedback Using Prompting Methods for News Summarization
Autor:
Pulari, Sini Raj
; Umadevi, Maramreddy
; Vasudevan, Shriram K.
Fecha:
01/03/2025Palabra clave:
Revista / editorial:
UNIRCitación:
S. R. Pulari, M. Umadevi, S. K. Vasudevan. Improved Fine-Tuned Reinforcement Learning From Human Feedback Using Prompting Methods for News Summarization, International Journal of Interactive Multimedia and Artificial Intelligence, vol. 9, no. 2, pp. 59-67, 2025, http://dx.doi.org/10.9781/ijimai.2025.02.001Tipo de Ítem:
articleDirección web:
https://www.ijimai.org/index.php/ijimai/article/view/259
Resumen:
ChatGPT uses a generative pretrained transformer neural network model, which is under the larger umbrella of generative models. One major boom after ChatGPT is the advent of prompt engineering, which is the most critical part of ChatGPT that utilizes Large Language Models (LLM) and helps ChatGPT provide the desired outputs based on the style and tone of interactions carried out with it. Reinforcement learning from human feedback (RLHF) was used as the major aspect for fine-tuning LLM-based models. This work proposes a human selection strategy that is incorporated in the RLHF process to prevent undesirable consequences of the rightful choice of human reviewers for feedback. H-Rouge is a new metric proposed for humanized AI systems. A detailed evaluation of State-of-the-art summarization algorithms and prompt-based methods have been provided as part of the article. The proposed methods have introduced a strategy for human selection of RLHF models which employs multi-objective optimization to balance various goals encountered during the process with H-Rouge. This article will help nuance readers conduct research in the field of text summarization to start with prompt engineering in the summarization field, and future work will help them proceed in the right direction of research.
Ficheros en el ítem
Nombre: Improved Fine-Tuned Reinforcement Learning From Human Feedback Using Prompting Methods for News Summarization.pdf
Tamaño: 1.372Mb
Formato: application/pdf
Este ítem aparece en la(s) siguiente(s) colección(es)
Estadísticas de uso
| Año |
| 2012 |
| 2013 |
| 2014 |
| 2015 |
| 2016 |
| 2017 |
| 2018 |
| 2019 |
| 2020 |
| 2021 |
| 2022 |
| 2023 |
| 2024 |
| 2025 |
| 2026 |
| Vistas |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 6 |
| Descargas |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
Ítems relacionados
Mostrando ítems relacionados por Título, autor o materia.
-
A Fault-Tolerant Mobile Computing Model Based On Scalable Replica
Sati, Meenakshi; Vikash, Vivek; Bijalwan, Vishwanath; Kumari, Pinki; Raj, Manish; Balodhi, Meenu; Gairola, Priya; Bhaskar Semwal, Vijay (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2014)The most frequent challenge faced by mobile user is stay connected with online data, while disconnected or poorly connected store the replica of critical data. Nomadic users require replication to store copies of critical ... -
Feature based video stabilization based on boosted HAAR Cascade and representative point matching algorithm
Raj, Rohit; Rajiv, Pooshkar; Kumar, Prabhat; Khari, Manju; Verdú, Elena ; González-Crespo, Rubén ; Manogarane, Gunasekaran (Image and Vision Computing, 09/2020)The success of handheld video capturing devices has further fueled the need of improved video stabilization. The videos often contain many foreground facial features like eyes, nose etc. These foreground features can be ... -
Hybrid Model for Passive Locomotion Control of a Biped Humanoid:The Artificial Neural Network Approach
Bhaskar-Semwal, Vijay; Raj, Manish; Nandi, G C (International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), 06/2018)Developing a correct model for a biped robot locomotion is extremely challenging due to its inherently unstable structure because of the passive joint located at the unilateral foot-ground contact and varying configurations ...





