VaultxGPU: GPU Proof-of-Space Plotting

An arXiv paper offloads Blake3 plotting to CUDA and SYCL kernels, reporting a 59.2x speedup over single-threaded CPU and beating a 384-thread CPU configuration.

The energy debate around Bitcoin is usually fought with slogans, but the engineering response to it is quieter and more interesting: replace wasted computation with something cheaper to verify. Proof-of-Space is one such answer, and a paper posted to arXiv on June 12 — VaultxGPU: GPU-Accelerated Blockchain Consensus, by Samuel Taiwo Fatunmbi, Om Amit Gandhi, and Luke Logan — pushes on the practical bottleneck that has kept Proof-of-Space from being the easy win it sounds like.

The paper opens by stating the cost it is trying to avoid in concrete terms.

"Blockchain consensus mechanisms based on Proof-of-Work consume significant energy, with Bitcoin alone estimated at approximately 150 TWh per year."— arXiv:2606.14007, source

That 150 TWh figure is the motivating number, and the authors use it to set up the alternative. Proof-of-Space reduces this cost by replacing repeated computation with storage — instead of burning energy hashing continuously, a participant proves it has dedicated disk space. But the catch, and the reason Proof-of-Space has not simply displaced Proof-of-Work, is that creating that storage is itself expensive: plot generation remains bottlenecked by CPU hashing throughput. You trade ongoing energy for an upfront compute cost, and that upfront cost is the wall.

From CPU plotting to GPU plotting

The work builds on a prior system, VaultX, described as a high-performance CPU-based Proof-of-Space plotter using multi-threaded Blake3 hashing that already achieved 4 to 50x faster plotting than Chia depending on hardware. VaultxGPU is the GPU-accelerated extension: it offloads the Blake3 hashing pipeline to the GPU using custom kernels. The implementation choices are the technically substantive part — the authors write the plotter in both CUDA for NVIDIA hardware and SYCL for AMD and Intel GPUs, keep Table 1 entirely in GPU VRAM, and fuse the sort and match stages into a single kernel to minimize data movement.

Those details matter because GPU acceleration is not free speedup; it is a memory-movement game. Plotting involves repeated hashing, sorting, and matching of large tables, and the naive approach shuttles data between host and device until the PCIe bus, not the compute units, becomes the bottleneck. Keeping the working table in VRAM and fusing the sort-and-match into one kernel is precisely the discipline that turns a GPU from a fancy accelerator that spends its time waiting on memory into one that actually runs flat out.

The speedup, and what it does and does not prove

The headline result is dramatic. The authors evaluate across K-values 27 through 31 against CPU baselines and report that the SYCL GPU implementation achieves a 59.2x speedup over a single-threaded CPU baseline, completing a K=31 plot in 45.4 seconds compared to 2,688 seconds. More telling than the single-thread comparison, they report it outperforms even the best 384-thread CPU configuration — that is the comparison that counts, because a 59x win over one thread could just mean the CPU baseline was weak. Beating a heavily parallel CPU setup is the claim that GPU is genuinely the better tool for this workload.

It is worth reading the conclusion precisely. The authors say these results confirm that GPU acceleration is the correct direction for scaling Proof-of-Space plotting beyond what CPU parallelism can achieve. That is a claim about plotting throughput, not about consensus security or energy at the network level. Faster plotting lowers the upfront cost of participating in a Proof-of-Space chain, which is the barrier the paper set out to attack. It does not, by itself, argue that Proof-of-Space is more secure than Proof-of-Work, nor does it re-run the network-level energy accounting — it removes a bottleneck on the storage-based alternative.

The dual-vendor implementation also speaks to a broader pattern worth naming for the IP-and-portfolios lane: performance work on consensus increasingly lives in the kernels, not the protocol. The consensus rule here — Proof-of-Space — is unchanged; what changed is the Blake3 pipeline's mapping onto GPU memory. That is the same place much of the recent advantage in ZK proving and in mining has migrated: the differentiator is no longer the cryptographic scheme on paper but the engineer who can keep an accelerator's compute units fed without stalling on memory. As more of the value in these systems concentrates in hand-tuned CUDA and SYCL kernels, the competitive and even the intellectual-property surface shifts with it — the defensible asset becomes the implementation, not the algorithm, which is a quieter but real consequence of results like this one.

There is also a subtle tension worth flagging for the energy-conscious reader. Proof-of-Space's appeal is that ongoing operation is cheap; the plotting is a one-time cost. Accelerating plotting with power-hungry GPUs shifts where the energy is spent rather than eliminating it, though concentrating it in a brief, one-time plotting phase is plausibly still a large net improvement over continuous Proof-of-Work hashing. The dual CUDA/SYCL implementation is the most strategically interesting choice: by not locking to NVIDIA, the authors make the plotter usable across the GPU market, which matters for decentralization — a plotting step that only runs well on one vendor's hardware quietly centralizes who can participate. As a piece of systems engineering aimed squarely at the practical barrier to greener consensus, VaultxGPU is a clean, well-scoped result, and its honesty about being a plotting-throughput contribution rather than a consensus-security one is to its credit.

For the broader sustainability conversation, the result is a reminder that 'greener consensus' is partly an engineering problem about throughput, and that the win comes from removing onboarding friction rather than from any change to the security model itself.

VaultxGPU Puts Proof-of-Space Plotting on the GPU to Sidestep Bitcoin's Energy Wall

From CPU plotting to GPU plotting

The speedup, and what it does and does not prove

Comments