Improved Fine-Tuned Reinforcement Learning From Human Feedback Using Prompting Methods for News Summarization

Pulari, Sini Raj; Umadevi, Maramreddy; Vasudevan, Shriram K.

doi:https://doi.org/10.9781/ijimai.2025.02.001

Archivos

Citación

Citar documentos

Compartir

Gestor bibliográfico

Métricas e impacto

Indexadores

Resumen

ChatGPT uses a generative pretrained transformer neural network model, which is under the larger umbrella of generative models. One major boom after ChatGPT is the advent of prompt engineering, which is the most critical part of ChatGPT that utilizes Large Language Models (LLM) and helps ChatGPT provide the desired outputs based on the style and tone of interactions carried out with it. Reinforcement learning from human feedback (RLHF) was used as the major aspect for fine-tuning LLM-based models. This work proposes a human selection strategy that is incorporated in the RLHF process to prevent undesirable consequences of the rightful choice of human reviewers for feedback. H-Rouge is a new metric proposed for humanized AI systems. A detailed evaluation of State-of-the-art summarization algorithms and prompt-based methods have been provided as part of the article. The proposed methods have introduced a strategy for human selection of RLHF models which employs multi-objective optimization to balance various goals encountered during the process with H-Rouge. This article will help nuance readers conduct research in the field of text summarization to start with prompt engineering in the summarization field, and future work will help them proceed in the right direction of research.

Colecciones

vol. 9, nº 2, march 2025

Cargando...

Página completa del ítem