Comparative analysis of paraphrasing performanceof ChatGPT, GPT-3, and T5 language modelsusing a new ChatGPT generated dataset: ParaGPT

Pehlivanoglu, Meltem Kurt; Abdan Syakura, Muhammad; de-la-Fuente-Valentín, Luis; Tadesse Gobosho, Robera; Shanmuganathan, Vimal

dc.contributor.author	Pehlivanoglu, Meltem Kurt
dc.contributor.author	Abdan Syakura, Muhammad
dc.contributor.author	de-la-Fuente-Valentín, Luis
dc.contributor.author	Tadesse Gobosho, Robera
dc.contributor.author	Shanmuganathan, Vimal
dc.date	2024
dc.date.accessioned	2024-12-09T11:19:35Z
dc.date.available	2024-12-09T11:19:35Z
dc.identifier.citation	Kurt Pehlivanoğlu, M., Gobosho, R. T., Syakura, M. A., Shanmuganathan, V., & de-la-Fuente-Valentín, L. (2024). Comparative analysis of paraphrasing performance of ChatGPT, GPT-3, and T5 language models using a new ChatGPT generated dataset: ParaGPT. Expert Systems, 41(11), e13699. https://doi.org/10.1111/exsy.13699	es_ES
dc.identifier.issn	1468-0394
dc.identifier.issn	0266-4720
dc.identifier.uri	https://reunir.unir.net/handle/123456789/17523
dc.description.abstract	Paraphrase generation is a fundamental natural language processing (NLP) task that refers to the process of generating a well-formed and coherent output sentence that exhibits both syntactic and/or lexical diversity from the input sentence, while simultaneously ensuring that the semantic similarity between the two sentences is preserved. However, the availability of high quality paraphrase datasets has been limited, particularly for machine-generated sentences. In this paper, we present ParaGPT, a new paraphrase dataset of 81,000 machine-generated sentence pairs, including 27,000 reference sentences (ChatGPT-generated sentences), and 81,000 paraphrases obtained by using three different large language models (LLMs): ChatGPT, GPT-3, and T5. We used ChatGPT to generate 27,000 sentences that cover a diverse array of topics and sentence structures, thus providing diverse inputs for the models. In addition, we evaluated the quality of the generated paraphrases using various automatic evaluation metrics. Furthermore, we provide insights into the strengths and drawbacks of each LLM in generating paraphrases by conducting a comparative analysis of the paraphrasing performance of the three LLMs. According to our findings, ChatGPT's performance, as per the evaluation metrics provided, was deemed impressive and commendable, owing to its higher-than-average scores for semantic similarity, which implies a higher degree of similarity between the generated paraphrase and the reference sentence, and its relatively lower scores for syntactic diversity, indicating a greater diversity of syntactic structures in the generated paraphrase. ParaGPT is a valuable resource for researchers working on NLP tasks like paraphrasing, text simplification, and text generation. We make the ParaGPT dataset publicly accessible to researchers, and as far as we are aware, this is the first paraphrase dataset produced based on ChatGPT	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Expert Systems	es_ES
dc.relation.ispartofseries	;vol. 41, n. 11
dc.relation.uri	https://onlinelibrary.wiley.com/doi/10.1111/exsy.13699	es_ES
dc.rights	openAccess	es_ES
dc.subject	ChatGPT	es_ES
dc.subject	generative artificial intelligence	es_ES
dc.subject	large language models	es_ES
dc.subject	machine learning	es_ES
dc.title	Comparative analysis of paraphrasing performanceof ChatGPT, GPT-3, and T5 language modelsusing a new ChatGPT generated dataset: ParaGPT	es_ES
dc.type	article	es_ES
reunir.tag	~OPU	es_ES
dc.identifier.doi	https://doi.org/10.1111/exsy.13699

Ficheros en el ítem

Nombre:: Comparative analysis of.pdf
Tamaño:: 6.025Mb
Formato:: PDF

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

Otras Publicaciones: artículos, libros...

Mostrar el registro sencillo del ítem

Comparative analysis of paraphrasing performanceof ChatGPT, GPT-3, and T5 language modelsusing a new ChatGPT generated dataset: ParaGPT

Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Ítems relacionados

Emerging Technologies Landscape on Education. A review ﻿

Case of Study in Online Course of Computer Engineering during COVID-19 Pandemic ﻿

Learning Management Systems Activity Records for Students' Assessment of Generic Skills ﻿

Emerging Technologies Landscape on Education. A review

Case of Study in Online Course of Computer Engineering during COVID-19 Pandemic

Learning Management Systems Activity Records for Students' Assessment of Generic Skills