Some thoughts on AI-powered review tools
How LLM review tools may transform peer review from gatekeeping to verification and potentially make journals less essential
AI-powered freview tools, such as Refine, are becoming increasingly popular. At first glance, the purpose of these tools seems straightforward: help authors improve clarity, polish writing, and strengthen the presentation of their work before submission. The recent decision to make Refine freely available for EC submissions appeared to reinforce that interpretation. These systems looked like productivity tools for researchers.
But I recently saw Refine promoting partnerships with journals. The stated goal: reduce reviewer workload in an era of rising submission volumes. If AI can help screen papers, summarize contributions, identify weaknesses, and streamline referee reports, many editors would understandably be interested.
Yet this raises a deeper question. What happens when both sides of the process rely on the same class of tools?
If authors use LLMs to preemptively polish papers, address likely criticisms, and improve exposition, then journals may receive submissions that are increasingly optimized for AI-based review criteria. In turn, review tools may be forced to search for smaller and smaller flaws in already polished manuscripts. The equilibrium could become strange: authors using AI to satisfy reviewers, reviewers using AI to uncover issues generated by authors anticipating AI reviewers. We may be entering a recursive loop in which machines are increasingly evaluating work prepared for machines.
Another possibility is that these tools eventually stop behaving like harsh gatekeepers and instead become validators. If a paper meets certain methodological, statistical, and presentation standards, the review system may simply certify it as technically sound. In that world, peer review shifts away from subjective judgments of novelty or style and toward verification.
This leads to an even more provocative implication. If journals can use agentic review tools, why can authors not use agentic response tools? Imagine submitting a paper alongside an expert AI agent trained on the manuscript, data, and code. Reviewers request robustness checks, additional tables, alternative specifications, or clarifications. The author’s agent runs the analysis, produces the output, and responds instantly. After several rounds of machine-to-machine interaction between review agents and author agents, the paper emerges revised and publication-ready.
Push the idea one step further, and the role of journals themselves becomes less obvious. Researchers could upload papers to open repositories such as arXiv or SSRN, accompanied by a transparent AI-generated review report evaluating correctness, assumptions, robustness, and contribution. Readers would then observe both the paper and the audit trail. Rather than waiting months for editorial gatekeeping, the market for ideas could operate in real time.
Under this scenario, the central scarcity is no longer correctness. AI systems may make it easier to detect coding errors, flawed identification strategies, missing citations, or weak robustness checks. Technical quality becomes cheaper to verify. The scarce resource instead becomes attention. If many papers are methodologically sound, then the key question is not whether a paper is “correct,” but whether it is important, useful, original, or worth reading.
That may be the real future of research publishing. Journals historically bundled multiple functions: quality control, certification, filtering, and distribution. AI may unbundle them. Verification can be automated. Distribution is already open. What remains hardest to automate is judgment about relevance.
If so, LLM review tools are not just changing peer review. They may be quietly redefining why peer review exists at all.

