N.B. all benchmarks were measured on a Surface Pro 6, with an i7-8650U CPU at 2.1GHz, 4 physical cores and 16GB RAM.

Our open source library, barretenberg (https://github.com/AZTECProtocol/barretenberg), contains an implementation of TurboPLONK over the BN254 curve. This variant implements Pedersen hashes using our fixed-base scalar multiplication gate. We benchmarked barretenberg against the current fastest Groth16 prover that also supports the BN254 curve, Bellman. TurboPLONK is approximately five times faster than the Groth16 implementation.

When comparing raw constraint counts, TurboPLONK requires 5 times fewer constraints than an equivalent R1CS-base SNARK

It should be noted that a TurboPLONK constraint requires approximately twice as much prover 'work' than a Groth16 constraint. 20% of the Groth16 constraints are also bit decompositions; these binary gates are approximately 4 times as expensive for the prover in TurboPLONK than Groth16. (Since PLONK doesn't have the property that the scalars in prover computation are actually equal to the witness.)

Under this model, the TurboPLONK prover should be approximately 2.25 times faster than Groth16.

The remaining speed advantage between the TurboPLONK prover and the Groth16 prover can be explained by considering the relative speeds of the two libraries when computing multi-scalar-multiplications in G1, where the bulk of prover time is spent

Using original techniques for performing this algorithm, Barretenberg can compute over 1 million scalar multiplications in < 1 second on average consumer-grade hardware.

Pippenger's multi-exponentiation algorithm is the fastest known method to compute a multiple scalar-multiplication over different bases. Our new contribution is the observation that, when computing multi-products, using the *affine* point addition formula requires substantially fewer field multiplications than formulae for projective and jacobian coordinates.

When evaluating affine point addition, the formula is:

$\lambda = \frac{y_2 - y_1}{x_2 - x_1} \text{ mod } p\\
x_3 = \lambda^2 - (x_2 + x_1) \text{ mod } p\\
y_3 = \lambda(x_1 - x_3) - y_1 \text{ mod } p$

Computing $\lambda$ requires a modular inverse, which is orders of magnitude more expensive than a field multiplication.

However, the Pippenger multi-products can be arranged to produce sequences of independent point additions. That is, the outputs of additions in the sequence are not inputs to any additions in the sequence.

This allows for the use of Montgomery's trick to compute batch modular inverses

$\forall i \in \{1, \ldots, n\}: d_i = \prod_{j=1}^{i -1}a_j
\\
\\
s = (d_na_n)^{-1}
\\
\forall i \in \{1, \ldots, n \}: e_i = s\prod_{j=i + 1}^{n}a_j
\\
\forall i \in \{1, \ldots, n \}: r_i = d_{i}e_i$

Assuming a sufficiently large $n$, the cost of a batched modular inverse is 3 field multiplications per inverse. This produces a cost of 6 field multiplications to compute a point addition. For short Weierstrass curves, the traditional mixed-addition (projective) formula requires 11 field multiplications.