Toyni: A STARK Implementation in Progress

Caution

This is research / hobby code. It has not been audited and is not suitable for production use. Do not rely on it for any setting where a broken proof would have real-world consequences.

Note

Toyni was migrated from jonas089's Github. Click here to see the past commit history.

Status

The Toyni STARK toolkit is in a solid state. The core proof system (domain construction, NTT-based FFT/IFFT, FRI low-degree testing, Merkle commitment, Fiat-Shamir transcript, BabyBear field arithmetic) is implemented end-to-end and exercised by the bundled Fibonacci AIR (src/fibonacci.rs + src/verifier.rs). The unit-test suite passes and the same primitives are used by the zkvm project as its proving backend.

What is not in scope for Toyni: the AIR for any non-trivial program. Toyni provides the building blocks; consumers (like zkvm) define their own constraint systems on top.

Learning

This implementation goes hand-in-hand with my article on STARKs that you can read here. Toyni relies on prime fields and DEEP-ALI for soundness. I am looking forward to exploring binary field STARKs later in my career.

Meet the amazing artist behind this creation, Kristiana Skrastina

The Fibonacci example

Toyni ships with a small Fibonacci AIR that proves correct evaluation of the recurrence over a power-of-two trace length:

| var |
|-----|
|  1  |
|  1  |
|  2  |
| ... |
| 13  |
| 21  |

The constraint reduces to:

fn fibonacci_constraint(t2: BabyBear, t1: BabyBear, t0: BabyBear) -> BabyBear {
    t2 - (t1 + t0)
}

Run it with:

cargo test test_fibonacci -- --nocapture

CUDA NTT acceleration (`cuda` feature)

Toyni includes an optional CUDA backend for the NTT (forward + inverse). It's gated behind the cuda feature flag and falls back to the CPU path when the feature is off or no GPU is detected at runtime. The kernel and its FFI live in cuda/ntt_kernel.cu and src/ntt.rs; build.rs invokes nvcc only when the feature is enabled.

The path is tuned for repeated NTTs of the same size, the typical proving workload. Per n it caches forward + inverse twiddles plus a reusable device buffer in a global context, eliminating per-call cudaMalloc / cudaFree, host-side twiddle precomputation, and per-stage H2D copies. Mul reduction inside butterflies uses Barrett with mu = floor(2^64 / p). The build emits native code for sm_75/86/89 and, when the toolkit supports it, sm_120 for Blackwell GPUs (RTX 50xx); a forward-compat PTX target is always emitted as a fallback.

NTT benchmark (RTX 5090, driver 595, CUDA 13)

End-to-end zkvm prover wall-time on the bundled heavy-rust example (trace ≈ 2²⁰, LDE ≈ 2²³, ~491 NTTs of size 2²⁰ and 2²³ combined):

Backend	Wall time
CPU only	~150 s
CUDA NTT (this branch)	~11 s

That's a ~13× speedup attributable almost entirely to the NTT path. The remaining wall time is single-threaded CPU work in the consuming zkvm's constraint and DEEP loops, which are not part of Toyni.

Build and test the CUDA path with:

cargo test --features cuda --release

Theory: security properties

STARKs achieve their security through a combination of domain extension, low-degree testing, and Merkle commitments. The three mechanisms are:

Domain Extension (Blowup). The composition polynomial is evaluated over a domain b times larger than the original trace length.
Low-Degree Testing. FRI ensures the polynomial being tested is close to a valid low-degree polynomial, with folding consistency checks at each layer.
Merkle Commitments. Each FRI layer is committed via a Merkle tree, ensuring the integrity of the folding process and enabling efficient verification.

The soundness error (probability of accepting an invalid proof) is bounded by:

Pr[undetected cheat] = (1/b)^q

where b is the blowup factor and q is the number of queries. With blowup 8 and 10 queries, the soundness error is at most (1/8)^10 ≈ 10⁻⁷.

Associated With


Ciphercurve

2025 Ciphercurve

Building the future of privacy-preserving computation

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
art		art
cuda		cuda
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
rust-toolchain		rust-toolchain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toyni: A STARK Implementation in Progress

Status

Learning

The Fibonacci example

CUDA NTT acceleration (`cuda` feature)

NTT benchmark (RTX 5090, driver 595, CUDA 13)

Theory: security properties

Associated With

2025 Ciphercurve

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toyni: A STARK Implementation in Progress

Status

Learning

The Fibonacci example

CUDA NTT acceleration (cuda feature)

NTT benchmark (RTX 5090, driver 595, CUDA 13)

Theory: security properties

Associated With

2025 Ciphercurve

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

CUDA NTT acceleration (`cuda` feature)

Packages