It was in just the second article of more than 1,000 that Otto Kalliokoski was screening that he spotted what he calls a “Photoshop masterpiece”.
The paper showed images from western blots — a technique used to analyse protein composition — for two samples. But Kalliokoski, an animal behaviourist at the University of Copenhagen, found that the images were identical down to the pixel, which he says is clearly not supposed to happen.
Image manipulation in scientific studies is a known and widespread problem. All the same, Kalliokoski and his colleagues were startled to come across more than 100 studies with questionable images while compiling a systematic review about a widely used test of laboratory rats’ moods. After publishing the review1 in January, the researchers released a preprint2 documenting the troubling studies that they uncovered and how these affected the results of their review. The preprint, posted on bioRxiv in February, has not yet been peer reviewed.
Their work “clearly highlights [that falsified images] are impacting our consolidated knowledge base”, says Alexandra Bannach-Brown, a systematic-review methodologist at the Berlin Institute of Health who was not involved with either the review or the preprint. Systematic reviews, which summarize and interpret the literature on a particular topic, are a key component of that base. With an explosion of scientific literature, “it’s impossible for a single person to keep up with reading every new paper that comes out in their field”, Bannach-Brown says. And that means that upholding the quality of systematic reviews is ever more important.
Pile-up of problems
Kalliokoski’s systematic review examined the reliability of a test designed to assess reward-seeking in rats under stress. A reduced interest in a reward is assumed to be a proxy symptom of depression, and the test is widely used during the development of antidepressant drugs. The team identified an initial pool of 1,035 eligible papers; 588 contained images.
By the time he’d skimmed five papers, Kalliokoski had already found a second one with troubling images. Not sure what to do, he bookmarked the suspicious studies and went ahead with collating papers for the review. As the questionable papers kept piling up, he and his colleagues decided to deploy Imagetwin, an AI-based software tool that flags problems such as duplicated images and ones that have been stretched or rotated. Either Imagetwin or the authors’ visual scrutiny flagged 112 — almost 20% — of the 588 image-containing papers.
“That is actually a lot,” says Elizabeth Bik, a microbiologist in San Francisco, California, who has investigated image-related misconduct and is now an independent scientific-integrity consultant. Whether image manipulation is the result of honest error or an intention to mislead, “it could undermine the findings of a study”, she says.
Small but detectable effect
For their final analysis, the authors examined all the papers that met their criteria for inclusion in their review. This batch, consisting of 132 studies, included 10 of the 112 that the team had flagged as having potentially doctored images.
Journals adopt AI to spot duplicated images in manuscripts
Analysis of these 10 studies alone assessed the test as 50% more effective at identifying depression-related symptoms than did a calculation based on the 122 studies without questionable images. These suspicious studies “do actually skew the results”, Kalliokoski says — although “not massively”, because overall variations in the data set mask the contribution from this small subset.
Examples from this study “cover pretty much all types of image problems”, Bik says, ranging from simple duplication to images that showed evidence of deliberate alteration. Using a scale that Bik developed to categorize the degree of image manipulation, the researchers found that most of the problematic images showed signs of tampering.
The researchers published their review in January in Translational Psychiatry without telling the journal that it was based in part on papers that included suspicious images. The journal’s publisher, Springer Nature, told Nature that it is investigating. (The Nature news team is editorially independent of its publisher, Springer Nature).
When they published their preprint the following month, the researchers included details of all the papers with suspicious images. They also flagged each study on Pubpeer, a website where scientists comment anonymously on papers. “My first allegiance is towards the [scientific] community,” Kalliokoski says, adding that putting the data out is the first step.
Bring reviews to life
The process of challenging a study’s integrity, giving its authors a chance to respond and seeking retraction for fraudulent studies can take years. One way to clear these muddied waters, says Bannach-Brown, is to publish ‘living’ systematic reviews, which are designed to be updated whenever papers get retracted or new research is added. She has helped to develop one such method of creating living reviews, called Systematic Online Living Evidence Summaries.
Systematic-review writers are also keen to see publishers integrate standardized ways to screen out dubious studies — rather than waiting until a study gets retracted.
Authors, publishers and editorial boards need to work together, Bannach-Brown says, to “catch some of these questionable research practices before they even make it to publication.”