Is ChatGPT corrupting peer review? Telltale words hint at AI use


A study suggests that researchers are using chatbots to assist with peer review.Credit: Rmedia7/Shutterstock

A study that identified buzzword adjectives that could be hallmarks of AI-written text in peer-review reports suggests that researchers are turning to ChatGPT and other artificial intelligence (AI) tools to evaluate others’ work.

The authors of the study1, posted on the arXiv preprint server on 11 March, examined the extent to which AI chatbots could have modified the peer reviews of conference proceedings submitted to four major computer-science meetings since the release of ChatGPT.

Their analysis suggests that up to 17% of the peer-review reports have been substantially modified by chatbots — although it’s unclear whether researchers used the tools to construct reviews from scratch or just to edit and improve written drafts.

The idea of chatbots writing referee reports for unpublished work is “very shocking” given that the tools often generate misleading or fabricated information, says Debora Weber-Wulff, a computer scientist at the HTW Berlin–University of Applied Sciences in Germany. “It’s the expectation that a human researcher looks at it,” she adds. “AI systems ‘hallucinate’, and we can’t know when they’re hallucinating and when they’re not.”

The meetings included in the study are the Twelfth International Conference on Learning Representations, due to be held in Vienna next month, 2023’s Annual Conference on Neural Information Processing Systems, held in New Orleans, Louisiana, the 2023 Conference on Robot Learning in Atlanta, Georgia, and the 2023 Conference on Empirical Methods in Natural Language Processing in Singapore.

See also  How to Stay Updated on ChatGPT News and Developments

Nature reached out to the organizers of all four conferences for comment, but none responded.

Buzzword search

Since its release in November 2022, ChatGPT has been used to write a number of scientific papers, in some cases even being listed as an author. Out of more than 1,600 scientists who responded to a 2023 Nature survey, nearly 30% said they had used generative AI to write papers and around 15% said they had used it for their own literature reviews and to write grant applications.

In the arXiv study, a team led by Weixin Liang, a computer scientist at Stanford University in California, developed a technique to search for AI-written text by identifying adjectives that are used more often by AI than by humans.

By comparing the use of adjectives in a total of more than 146,000 peer reviews submitted to the same conferences before and after the release of ChatGPT, the analysis found that the frequency of certain positive adjectives, such as ‘commendable’, ‘innovative’, ‘meticulous’, ‘intricate’, ‘notable’ and ‘versatile’, had increased significantly since the chatbot’s use became mainstream. The study flagged the 100 most disproportionately used adjectives.

Reviews that gave a lower rating to conference proceedings or were submitted close to the deadline, and those whose authors were least likely to respond to rebuttals from authors, were most likely to contain these adjectives, and therefore most likely to have been written by chatbots at least to some extent, the study found.

“It seems like when people have a lack of time, they tend to use ChatGPT,” says Liang.

See also  Nons SL660 review: an instant camera photographers will fall in love with

The study also examined more than 25,000 peer reviews associated with around 10,000 manuscripts that had been accepted for publication across 15 Nature portfolio journals between 2019 and 2023, but didn’t find a spike in usage of the same adjectives since the release of ChatGPT.

A spokesperson for Springer Nature said the publisher asks peer reviewers not to upload manuscripts into generative AI tools, noting that these still have “considerable limitations” and that reviews might include sensitive or proprietary information. (Nature’s news team is independent of its publisher.)

Springer Nature is exploring the idea of providing peer reviewers with safe AI tools to guide their evaluation, the spokesperson said.

Transparency issue

The increased prevalence of the buzzwords Liang’s study identified in post-ChatGPT reviews is “really striking”, says Andrew Gray, a bibliometrics support officer at University College London. The work inspired him to analyse the extent to which some of the same adjectives, as well as a selection of adverbs, crop up in peer-reviewed studies published between 2015 and 2023. His findings, described in an arXiv preprint published on 25 March, show a significant increase in the use of certain terms, including ‘commendable’, ‘meticulous’ and ‘intricate’, since ChatGPT surfaced2. The study estimates that the authors of at least 60,000 papers published in 2023 — just over 1% of all scholarly studies published that year — used chatbots to some extent.

Gray says it’s possible peer reviewers are using chatbots only for copyediting or translation, but that a lack of transparency from authors makes it difficult to tell. “We have the signs that these things are being used,” he says, “but we don’t really understand how they’re being used.”

See also  From Beginner to Pro: The ChatGPT Mastery Guide for 2024

“We do not wish to pass a value judgement or claim that the use of AI tools for reviewing papers is necessarily bad or good,” Liang says. “But we do think that for transparency and accountability, it’s important to estimate how much of that final text might be generated or modified by AI.”

Weber-Wulff doesn’t think tools such as ChatGPT should be used to any extent during peer review, and worries that the use of chatbots might be even higher in cases in which referee reports are not published. (The reviews of papers published by Nature portfolio journals used in Liang’s study were available online as part of a transparent peer-review scheme.) “Peer review has been corrupted by AI systems,” she says.

Using chatbots for peer review could also have copyright implications, Weber-Wulff adds, because it could involve giving the tools access to confidential, unpublished material. She notes that the approach of using telltale adjectives to detect potential AI activity might work well in English, but could be less effective for other languages.



Source Article Link

Leave a Comment