Caution
This is research / hobby code. It has not been audited and is not suitable for production use. Do not rely on it for any setting where a broken proof would have real-world consequences.
Note
Toyni was migrated from jonas089's Github. Click here to see the past commit history.
The Toyni STARK toolkit is in a solid state. The core proof system
(domain construction, NTT-based FFT/IFFT, FRI low-degree testing, Merkle
commitment, Fiat-Shamir transcript, BabyBear field arithmetic) is
implemented end-to-end and exercised by the bundled Fibonacci AIR
(src/fibonacci.rs + src/verifier.rs). The unit-test suite passes and
the same primitives are used by the zkvm
project as its proving backend.
What is not in scope for Toyni: the AIR for any non-trivial program. Toyni provides the building blocks; consumers (like zkvm) define their own constraint systems on top.
This implementation goes hand-in-hand with my article on STARKs that you can read here. Toyni relies on prime fields and DEEP-ALI for soundness. I am looking forward to exploring binary field STARKs later in my career.
Meet the amazing artist behind this creation, Kristiana Skrastina
Toyni ships with a small Fibonacci AIR that proves correct evaluation of the recurrence over a power-of-two trace length:
| var |
|-----|
| 1 |
| 1 |
| 2 |
| ... |
| 13 |
| 21 |
The constraint reduces to:
fn fibonacci_constraint(t2: BabyBear, t1: BabyBear, t0: BabyBear) -> BabyBear {
t2 - (t1 + t0)
}Run it with:
cargo test test_fibonacci -- --nocaptureToyni includes an optional CUDA backend for the NTT (forward + inverse).
It's gated behind the cuda feature flag and falls back to the CPU path
when the feature is off or no GPU is detected at runtime. The kernel and
its FFI live in cuda/ntt_kernel.cu and
src/ntt.rs; build.rs invokes nvcc only when the
feature is enabled.
The path is tuned for repeated NTTs of the same size, the typical
proving workload. Per n it caches forward + inverse twiddles plus a
reusable device buffer in a global context, eliminating per-call
cudaMalloc / cudaFree, host-side twiddle precomputation, and per-stage
H2D copies. Mul reduction inside butterflies uses Barrett with
mu = floor(2^64 / p). The build emits native code for sm_75/86/89 and,
when the toolkit supports it, sm_120 for Blackwell GPUs (RTX 50xx); a
forward-compat PTX target is always emitted as a fallback.
End-to-end zkvm prover wall-time on the bundled heavy-rust example
(trace ≈ 2²⁰, LDE ≈ 2²³, ~491 NTTs of size 2²⁰ and 2²³ combined):
| Backend | Wall time |
|---|---|
| CPU only | ~150 s |
| CUDA NTT (this branch) | ~11 s |
That's a ~13× speedup attributable almost entirely to the NTT path. The remaining wall time is single-threaded CPU work in the consuming zkvm's constraint and DEEP loops, which are not part of Toyni.
Build and test the CUDA path with:
cargo test --features cuda --releaseSTARKs achieve their security through a combination of domain extension, low-degree testing, and Merkle commitments. The three mechanisms are:
- Domain Extension (Blowup). The composition polynomial is evaluated
over a domain
btimes larger than the original trace length. - Low-Degree Testing. FRI ensures the polynomial being tested is close to a valid low-degree polynomial, with folding consistency checks at each layer.
- Merkle Commitments. Each FRI layer is committed via a Merkle tree, ensuring the integrity of the folding process and enabling efficient verification.
The soundness error (probability of accepting an invalid proof) is bounded by:
Pr[undetected cheat] = (1/b)^q
where b is the blowup factor and q is the number of queries. With
blowup 8 and 10 queries, the soundness error is at most
(1/8)^10 ≈ 10⁻⁷.

