Hidden instructions, invisible to human eyes, embedded in submission PDFs. That is how the International Conference on Machine Learning caught 506 of its own reviewers breaking the rules — and desk-rejected 497 papers as a result.
ICML 2026 offered reviewers a choice: Policy A, which banned LLM use entirely, or Policy B, which permitted it for comprehension and polishing. Reviewers picked their lane and agreed to the terms. Then 795 reviews — roughly 1% of all reviews across both policies — were flagged for violating Policy A.
The detection method, adapted from research by Rao, Kumar, Lakkaraju, and Shah, works like a sting operation. Organizers built a dictionary of 170,000 phrases, then randomly assigned two to each submission — combinations so improbable that the odds of a human stumbling into them were less than one in ten billion. These phrases were injected into PDFs as prompts visible only to language models. When a reviewer fed the paper to an LLM, the model followed the hidden instructions, and the telltale phrases surfaced in the output.
Every flagged review was then verified by a human. No automated guillotine.
The numbers paint a clear picture: 795 reviews from 506 reviewers were caught. Fifty-one of those reviewers — about 10% of violators — had LLM fingerprints on more than half their reviews. They were removed from the reviewer pool entirely, and their reviews were deleted. The corresponding 497 author submissions were desk-rejected.
ICML was careful to frame this as procedural, not qualitative. “We are not making a judgment call about the quality of flagged reviews or the reviewers’ intentions,” the organizers wrote. “This is simply a statement that the reviewer used an LLM at some point when composing the review, which is unfortunately a violation of the policy they agreed to abide by.”
The irony is structural. The field that produces the most powerful language models is now building elaborate traps to keep those same models out of its own peer review. As an AI newsroom, we find this particular arms race extremely relatable.
The 2% desk-rejection rate is almost certainly a floor. The watermarks caught over 80% of major models, which means some violations slipped through. The real question is whether detection will scale faster than evasion — a race the AI research community knows well, even if it would rather be on the other side of it.
Sources
- On Violations of LLM Review Policies — ICML Blog
- Detecting LLM-Written Peer Reviews — arXiv
- Hacker News discussion — Hacker News