Zero-knowledge proofs have a reputation as a cryptography problem, but at scale they are a hardware problem. Generating a SNARK over a large statement is dominated by a handful of arithmetic-heavy primitives, and which primitive you build your prover around determines what kind of accelerator you need. A paper posted to arXiv on June 15 — When Proofs Meet Hardware: Comparing NTT and SumCheck in Zero-Knowledge Systems, by a team led by Jianqiao Mo with co-authors including Benedikt Bunz, Siddharth Garg, and Brandon Reagen — takes a long-running theoretical argument and finally settles it where it actually matters: on silicon.
The disagreement is old and clean. On paper, the SumCheck protocol looks like the obvious winner.
"In the ZKP community, it has long been discussed that the SumCheck protocol is asymptotically more efficient than the Number Theoretic Transform (NTT), requiring only $O(N)$ arithmetic versus $O(N \log N)$."— arXiv:2606.16146, source
That is the asymptotic case for SumCheck — linear arithmetic versus the NTT's O(N log N). But asymptotics are not architecture, and the authors lay out the competing intuition from accelerator designers: NTT is more hardware-friendly, benefiting from locality and data reuse, while SumCheck suffers from sequential, dependent rounds. In other words, SumCheck does less total work but does it in a way that maps poorly to parallel hardware, while NTT does more work in a pattern that hardware loves. Which effect dominates is exactly the kind of question that cannot be answered with a complexity bound.
A fair fight under a single roof
The paper's contribution is the first hardware-system-level direct comparison of NTT- and SumCheck-based proving primitives under a unified architectural framework. The phrase 'unified framework' is doing real work: the only way to make this comparison honest is to hold the resources constant. The authors evaluate both primitives in the context of the ZeroCheck protocol — a common building block in zkSNARKs — and implement optimized systems for each, then put them under the same on-chip SRAM and off-chip bandwidth budgets. That constraint is what separates a benchmark from an advertisement; a primitive can always look good if you let it have all the memory it wants.
The result is refreshingly anticlimactic: there is no universal winner. SumCheck generally outperforms NTT for high-degree polynomials, which fits the asymptotic story — when N is large enough, doing O(N) work beats O(N log N) even with worse data movement. But for low-degree polynomials, performance depends on memory availability, and under given SRAM budgets NTT might deliver better performance for medium-sized workloads by exploiting data reuse. The crossover, in other words, is governed by polynomial degree and by how much fast on-chip memory the accelerator can afford.
What this means for prover design
For anyone building or buying ZK acceleration, that conditional answer is more useful than a verdict. It says the right primitive is a function of your workload's polynomial degree and your hardware's memory hierarchy, not a fixed choice. A prover dominated by high-degree polynomial work should lean SumCheck; one dominated by medium-sized, low-degree work on a memory-rich accelerator may do better with NTT and its data reuse. The implication is that future high-performance provers may need both paths and a way to route work to whichever primitive fits the current statement and the available SRAM.
There is a methodological lesson here that generalizes beyond NTT and SumCheck. Cryptographic protocol design and hardware architecture have largely been separate conversations, with protocol people citing asymptotics and hardware people citing locality, and the two rarely meeting under a controlled experiment. This paper's value is to bridge those communities by making the comparison concrete: same protocol (ZeroCheck), same SRAM, same bandwidth, two primitives. The output is practical guidance for understanding the proving cost of NTT- and SumCheck-based zero-knowledge proof systems rather than a slogan about which is 'better.'
The result also has a quiet implication for the ZK-hardware market, where a great deal of capital is currently being committed to fixed-function accelerators. If neither primitive universally wins, then an accelerator hard-wired entirely around NTT — the more common bet, because NTT's regular memory pattern is easier to build silicon for — may underperform on the high-degree workloads that SumCheck-heavy protocols generate, and vice versa. The practical hedge the paper implies is programmability: an accelerator that can run both proving paths and dispatch by polynomial degree and available SRAM is more future-proof than one that commits to a single primitive. For an industry still converging on which proof systems will dominate, building flexibility into the hardware is the conservative engineering choice, and this controlled comparison is the first piece of evidence that makes that hedge quantifiable rather than merely prudent.
The honest limitation is scope. ZeroCheck is one building block, and a full zkSNARK pipeline includes commitments, hashing, and other stages whose costs may shift the balance again. The crossover behavior the authors find — SumCheck for high degree, NTT plausibly for medium low-degree workloads under SRAM pressure — is specific to the primitive and the budgets they tested. But that specificity is the point. The community has spent years debating an asymptotic comparison as if it answered the engineering question. The contribution is to show, with a controlled hardware-level experiment, that it does not, and to replace a slogan with a conditional rule that a prover architect can actually use.
For the IP-and-protocols beat, the cleanest takeaway is that the proving-cost question has moved from a theory seminar to an architecture review, and the answer now depends on workload and silicon rather than on a single line of asymptotic analysis.
Comments
Loading comments…