
NeurIPS, one of the world’s most prestigious artificial intelligence research conferences, held its 39th annual meeting in San Diego in December, attracting tens of thousands of contributors and participants. What was once primarily an academic gathering place has now become a prime hunting ground for top AI labs, where excellence can translate directly into job opportunities. Researchers whose papers are accepted for live presentation are considered the elite in their field.
However, Canadian startup GPTZero More than 4,000 research papers analyzed The paper, accepted for submission at NeurIPS (Neural Information Processing Systems) 2025, says it found hundreds of AI hallucination citations that missed the three or more reviewers assigned to each submitted paper, for a total of at least 53 papers. Such hallucinations have not been reported before.
From completely fictitious quotes to subtle changes
In some cases, the AI model blended or interpreted elements of multiple real papers, including credible-sounding titles and author lists, the company said. Others appear to be complete fabrications: non-existent authors, fabricated paper titles, fake journals or conferences, or URLs that go nowhere.
In other cases, the model starts with a real paper but makes subtle changes—expanding the author’s initials into a guessed name, removing or adding co-authors, or explaining the title. However, some are patently incorrect – for example, citing “John Smith” and “Jane Doe” as authors.
When contacted for comment, the NeurIPS board of directors issued the following statement: “The use of LLMs in AI conference papers is evolving rapidly, and NeurIPS is actively monitoring developments. In previous years, we piloted a policy regarding the use of LLMs, and by 2025, reviewers were instructed to flag hallucinations. Regarding the findings of this specific work, we emphasize that more effort is needed to determine the impact of even 1.1% Since, for example, the authors may have provided the LLM with a partial description of the citation and asked the LLM to generate a bibtex (formatted bibliography), NeurIPS remains committed to improving the review and author processes to best ensure scientific rigor and to identifying methods that the LLM can use to enhance the capabilities of authors and reviewers.”
Edward Tian, co-founder and CEO of GPTZero, which was founded in January 2023 bulge Series A financing of US$10 million in 2024 wealth Just weeks before NeurIPS’ analysis, the company found 50 phantom citations in papers reviewed at ICLR, another top AI research conference that will be held in Rio de Janeiro in April. In this case, the paper has not yet been accepted, but the false citation has slipped past the peer reviewers. Tian said the ICLR conference has hired the company to check future submissions for falsified citations during peer review.
Errors in NeurIPS accepted and submitted papers
Tian said the NeurIPS findings are even more troubling because the errors appeared in accepted through meetings. In AI academia, “publish or perish” is more than just a cliche: hiring and tenure often depend on the accumulation of peer-reviewed publications. However, according to long-standing academic norms, even a single fabricated citation could, in principle, be grounds for rejection. References are intended to anchor the paper within the existing body of research and demonstrate that its authors have actually read and participated in the work they cite.
“This is definitely a bigger upgrade in the sense that this is the first official documented case of hallucination citations entering a top machine learning conference,” Tian said. He noted that the acceptance rate of papers in major tracks since NeurIPS 2025 24.52%each of these papers beat 15,000 other papers, despite containing one or more illusions. “These were peer-reviewed and published in the final conference proceedings,” he said. “So it’s definitely a big moment.”
He added that about half of the papers with hallucinated citations were likely either generated by AI itself or made extensive use of AI. “But what we’re really looking at in this survey is the citations themselves,” he said. AI detection tools are often criticized for false positives when trying to identify machine-written text. But Tian believes that hallucination detection is a different kind of problem, and GPTZero’s tool checks for verifiable facts – searching the open web and academic databases to confirm whether the cited paper actually exists. The tool is more than 99% accurate, the company says, and for NeurIPS analysis, each tagged reference was also reviewed by human experts from the GPTZero machine learning team.
Alex Cui, Tian’s co-founder and chief technology officer, said GPTZero’s hallucination checking tool pulls a paper and then searches the open web and scholarly databases to verify each citation — its author, title, publication location and link. If a reference is not found, or it only partially matches the real paper, it will be flagged. This is how it captures situations where an AI model starts from a real paper but adds non-existent authors, changes titles, or invents publications.
“Sometimes, even if there is a match, you’ll find that they’ve added five authors to the real paper that don’t exist, so these are errors that humans wouldn’t reasonably make,” he explains. For the NeurIPS investigation, after the automated scan, a member of the GPTZero machine learning team manually verified each tagged reference to ensure that the results themselves were not false positives.
The large number of papers makes it difficult to conduct an in-depth review
A big part of the challenge is scale. In 2025, the main research direction of NeurIPS received 21,575 valid submissions, up from 15,671 in 2024 and 12,343 in 2023. Even with thousands of volunteer reviewers, this volume makes it increasingly difficult to conduct an in-depth review of each paper and its references.
But Tian says that while artificial intelligence plays an important role in this, making it easier to submit bulk conference papers, a flawed paper still poses real reputational risks — for the author, the conference that accepts the paper, and the companies that hire researchers based on those credentials.
This is especially true for citations, he said, because in modern AI research, citations are a central part of the field’s attempts to address reproducibility issues. “AI results are notoriously difficult to reproduce, so citations are important,” he said, “drawing the line between whether a result is reproducible” by having other researchers trace the results back to something specific and testable. Hallucinatory references, on the other hand, take the reader to something that does not exist.

