764 Retracted Papers: What Happens Without AI Detection in Publishing

Scientific journals were supposed to be the last place where text actually gets vetted. Peer review and editorial oversight were meant to keep bad science out. They didn't.

In early 2026, three independent studies reached the same conclusion: AI-generated papers have gotten into peer-reviewed journals, where they get cited and used as the basis for further research. And when a fake finally gets pulled, sometimes more than eighteen months have already gone by. That's plenty of time to influence dozens of other publications.

A systematic review in Frontiers and an analysis of 764 retractions via PubMed both point to 2023 as the peak year. A third study, not yet peer-reviewed, covers 3,974 retracted DOIs (Preprints.org) and pushes the argument further: generative AI is no longer a fringe factor in research fraud. It has become the infrastructure. And most publishers still haven't adopted AI detection software. Here's what the numbers look like.

Academic Integrity Under Pressure: The AI Detection Numbers

The geographic pattern is hard to ignore. Across all three studies, the majority of retracted papers trace back to the same regions: 551 out of 764 in the PubMed dataset came from China, 40 from India, 23 from Bangladesh. The Frontiers review puts it at 72.2% of authors based in China. That doesn't mean the problem is Chinese. It means the paper mill industry found its biggest market there, where publish-or-perish pressure is extreme and enforcement lags behind.

The reasons for retraction overlap almost completely across studies. In the PubMed dataset: peer review issues flagged in 716 of 764 papers, data concerns in 714, irrelevant or fabricated citations in 571, and direct evidence of unethical AI use in 238. Most papers got pulled for multiple reasons at once. The Preprints.org analysis confirmed the same pattern at larger scale: AI-generated content clusters tightly with paper mill signatures. Where you find one, you find the other.

AI Publication Retractions: The Numbers That Matter

You'd think once a paper is retracted, the matter is settled. In practice, a retraction changes very little. Retracted work keeps getting cited, and a third of the authors don't even consider themselves at fault. The cleanup mechanism exists on paper. It just can't keep up with the volume.

And here's the real dead end: between a fake getting published and getting retracted, two to three years go by. During that time the work ends up in other people's meta-analyses, becomes part of somebody's conclusions. The retraction arrives when the damage is already baked into the system. An AI detection check at submission would flag most of these. But almost no journal uses an AI detector for research papers. The next section looks at where this flood is coming from.

Paper Mills and Generative AI: How Science Gets Fabricated

Before generative AI, paper mills worked differently. They hired people, bought reviews, faked data by hand. Expensive and slow. Generative models made all of that unnecessary.

A telling example: Wiley bought the publisher Hindawi in 2021 for $298 million (Retraction Watch). A good deal, except for one thing. By 2024 they had to retract over 11,300 articles. Losses hit $35–40 million, the CEO left, the brand got shut down. The problem turned out to be special issues: peer review there was handled by guest editors, and that's exactly how paper mills got into the system.

Wiley is not some unknown publisher. If paper mills got into Wiley, they're missing only from journals that haven't checked yet. Frontiers in July 2025 found a network of 122 articles with manipulated peer review and citations (Frontiers), and along the way flagged 4,000 suspicious papers at seven other publishers.

The marginal cost of a fake paper today is close to zero. An LLM generates unique text in minutes. Old AI detection methods (looking for linguistic errors, duplicate images) are useless here: every output is one of a kind.

ChatGPT Cites Retracted Papers: The Feedback Loop

Fabricated articles are only half the problem. The other half: the AI systems researchers rely on can't tell the difference between a retracted paper and a valid one.

Retraction Watch describes an experiment: 217 flagged publications were run through ChatGPT 30 times each. Zero retraction warnings. The bot called most of the retracted work "internationally excellent." When fed specific claims from those papers and asked whether they were true, two thirds of the answers came back: yes, true. Even though the data was fabricated.

It gets worse. MIT Technology Review tested not just ChatGPT but tools that market themselves as research assistants: Elicit, Ai2 ScholarQA, Perplexity, Consensus. The results are in the table below, and they all look equally bad.

AI Tool Retracted papers cited (out of 21) Retraction warnings
Consensus 18 0
Ai2 ScholarQA 17 0
Perplexity 11 0
Elicit 5 0
ChatGPT (GPT-4o) 5 3

Source: MIT Technology Review, September 2025.

A separate study tested 21 chatbots (ChatGPT, Copilot, Gemini among them) and found they correctly spotted fewer than half of retracted publications from a control list. False positive rate: around 18%.

The AI Citation Feedback Loop

So it's a loop. No citation checker, human or automated, breaks it. AI generates fake papers. They pass review, they get published. Then other AI systems pull those same papers as sources for answers. Someone asks ChatGPT about a medical treatment, the bot finds an article with a real DOI, serves it up as trustworthy. The article was retracted ages ago. The user will never know, because not one of the tested tools works as a reference checker by default.

Beyond Academia: Why AI Detection in Writing Concerns Everyone

By 2025, the Retraction Watch database had accumulated over 59,000 withdrawn publications (Sharon Kabel). Publishers are responding, but they're building filters for the last generation of fakes. Each new model puts out text that passes the tests the previous one flunked. The Preprints.org preprint calls this the "sophistication gap," and for now it's widening.

But this stopped being just an academic problem a long time ago. Say a doctor prescribes a treatment based on clinical guidelines. Those guidelines rest on a meta-analysis of dozens of studies. If a paper mill study slipped in, the doctor is treating a patient based on false data. They don't know that. The patient certainly doesn't. Peer review is the strictest filter for text that exists anywhere. If the most rigorous academic integrity checker can't cope, what about marketing copy and press releases? Nobody there checks whether the author is a person or a model.

AI detection tools like the It's AI detector don't solve everything. But they answer a concrete question: did a human write this, or not? Without that answer, judging whether any text is authentic is just guesswork.


FAQ