Inference and FLOPs Benchmark

Overview

The Inference and FLOPs Benchmark evaluates a validator’s ability to perform accurate and efficient inference on transformer models while measuring achieved FLOPS—the effective floating-point operations per second completed during the task.

This benchmark reflects real-world AI workloads and serves as a primary indicator of compute quality.

Benchmark Details

Validators instantiate transformer models deterministically based on provided seeds.
Random input sequences are generated and passed through the model.
Outputs are compared to expected targets with a tolerance $\epsilon$ .
Execution time is recorded to compute achieved FLOPS.

Scoring

The benchmark score corresponds to the maximum model complexity $n$ that the validator can process correctly within time constraints.
Achieved FLOPS are calculated as:
$\text{Achieved FLOPS} = \frac{\text{Total FLOPs for inference}}{\text{Execution time}}$
This metric is used to assess validator performance and reputation.

Importance

Measures actual model inference speed and correctness.
Aligns compute validation with meaningful AI workloads.
Provides objective, reproducible performance metrics.

Overview​

Benchmark Details​

Scoring​

Importance​

Overview

Benchmark Details

Scoring

Importance