ArXiv bans researchers for one year over unchecked AI-generated papers
At a glance:
- ArXiv will ban researchers for one year if they submit papers with obvious signs of unchecked AI generation
- The policy targets evidence like hallucinated references, leftover chatbot instructions, and placeholder data
- The move addresses a growing problem of AI-generated "slop" in academic research, with fabricated citations increasing twelvefold since 2023
What happened
ArXiv, the open-access repository that has served as the primary distribution channel for preprint research in computer science, mathematics, and physics for more than three decades, will ban authors for one year if they submit papers containing obvious signs of unchecked AI generation. Thomas Dietterich, chair of arXiv's computer science section, announced the policy on Thursday, writing that submissions with "incontrovertible evidence" of unvetted large language model output mean "we can't trust anything in the paper."
The rule is not a blanket prohibition on using AI tools. Researchers can still use language models for drafting, editing, or analysis. What triggers the penalty is evidence that an author pasted LLM output into a paper without checking it, the kind of carelessness that produces hallucinated references, placeholder instructions from the chatbot, or fabricated data tables with notes reading "fill in with the real numbers from your experiments." If moderators find such evidence and a section chair confirms it, the author faces a one-year ban from arXiv, after which all subsequent submissions must first be accepted by a peer-reviewed journal before they can appear on the platform.
Why it matters
ArXiv is not a journal. It does not peer-review papers. But it has become the de facto way that research circulates in several of the fastest-moving fields in science, particularly machine learning and artificial intelligence. Papers posted to arXiv are read, cited, and built upon long before they appear in formal publications, if they ever do. That makes the platform's quality standards unusually consequential: a hallucinated citation on arXiv can propagate through the research literature just as effectively as one in a peer-reviewed journal, and often faster.
The platform receives thousands of submissions each month, and its volunteer moderation system was not designed to screen for machine-generated content at scale. Dietterich's announcement described the new penalty as a "one-strike" rule, though decisions are subject to appeal and require confirmation by a section chair before being imposed. This formal penalty represents the first major preprint platform to address AI-generated slop with concrete consequences, acknowledging that the problem has moved beyond occasional carelessness to a structural issue requiring dedicated infrastructure to combat.
The scale of the problem
The scale of the problem is significant. A study published in The Lancet in May 2026 by researchers at Columbia University audited 2.5 million biomedical papers and 126 million references indexed on PubMed Central. It found that fabricated citations have risen twelvefold since 2023. In that year, roughly one in 2,828 papers contained at least one fake reference. By 2025, the rate had climbed to one in 458. In the first seven weeks of 2026, it was one in 277.
The researchers attributed the surge to the proliferation of AI writing tools, noting that previous studies estimate 30 to 69 per cent of LLM-generated references in biomedical contexts are fabricated. This exponential growth in AI-generated content without proper oversight threatens the integrity of the entire research ecosystem, particularly in fields where preprints circulate widely before formal peer review. The problem extends beyond references to potentially include fabricated data, unsubstantiated claims, and entire papers generated with minimal human oversight.
What counts as evidence
The policy is deliberately narrow in what it targets. Dietterich listed specific examples of "incontrovertible evidence": hallucinated references that do not correspond to any real publication, meta-comments from the language model left in the text (such as "here is a 200-word summary; would you like me to make any changes?"), and placeholder data with instructions to the author that were never removed. These are not subtle quality failures. They are signs that the author did not read the paper before submitting it.
The distinction matters because it avoids the far more difficult question of whether AI-assisted writing should be permitted at all. ArXiv's existing policy already states that authors bear "full responsibility" for their content "irrespective of how the contents are generated." The new penalty enforces that principle by targeting the most egregious violations, cases where the author's failure to exercise any oversight is provable from the text itself. By focusing on obvious slop, arXiv can enforce the rule without needing to build or buy an AI-detection system, a technology that remains prone to its own errors.
A broader problem
ArXiv is not the only institution struggling with the issue. Academic conferences in computer science, including NeurIPS and ICML, have reported surges in submissions that appear to be generated with minimal human oversight. Nature published a feature in late 2025 describing how AI slop is creating a crisis in computer science, where the volume of low-quality submissions is overwhelming reviewers and diluting the signal-to-noise ratio of the field's output.
Peer-reviewed journals face the same problem. The Lancet study found that fabricated citations appeared in papers that had already passed peer review, suggesting that reviewers are either not checking references or are unable to identify fabrications at the rate they are now appearing. Lead author Maxim Topaz, of Columbia University's School of Nursing, warned that clinicians and guideline developers have no way of knowing when the evidence they rely on does not exist, a gap that efforts to reduce AI hallucinations in scientific research have not yet closed. This widespread challenge across the entire research ecosystem highlights the need for coordinated responses beyond any single platform or institution.
The limits of enforcement
The new rule will catch the most careless offenders, researchers who submit papers they have not read. It will not catch researchers who use language models to generate plausible but incorrect claims, fabricate data, or produce papers that are fluent but scientifically vacuous. Those problems require peer review, institutional oversight, and a willingness within the research community to treat AI-assisted misconduct with the same seriousness as traditional forms of fabrication.
What arXiv's policy does establish is a principle: if you submit a paper, you are responsible for every word in it. That has always been true in theory. The difference now is that language models have made it trivially easy to produce text that reads like science but contains nothing of substance. ArXiv's one-year ban is a modest penalty for a serious offence, but it is also the first formal acknowledgement by a major research platform that the problem is no longer one of occasional carelessness. It is structural, it is growing, and it requires dedicated infrastructure to combat.
ArXiv itself is undergoing structural changes that may help it address the challenge. After more than 20 years as a project hosted by Cornell University, the platform is becoming an independent nonprofit, a move that should give it greater autonomy over its moderation policies and the ability to raise funds specifically to combat quality problems. It has also introduced a requirement for first-time submitters to obtain an endorsement from an established author, a gatekeeping measure aimed at reducing the volume of submissions from accounts created solely to publish AI-generated material.
FAQ
What exactly triggers the one-year ban at ArXiv?
How does ArXiv's policy differ from a blanket ban on AI tools?
What broader challenges does the research community face with AI-generated content?
More in the feed
Prepared by the editorial stack from public data and external sources.
Original article