Inference and FLOPs Benchmark
Overview
The Inference and FLOPs Benchmark evaluates a validator’s ability to perform accurate and efficient inference on transformer models while measuring achieved FLOPS—the effective floating-point operations per second completed during the task.
This benchmark reflects real-world AI workloads and serves as a primary indicator of compute quality.
Benchmark Details
- Validators instantiate transformer models deterministically based on provided seeds.
- Random input sequences are generated and passed through the model.
- Outputs are compared to expected targets with a tolerance .
- Execution time is recorded to compute achieved FLOPS.
Scoring
-
The benchmark score corresponds to the maximum model complexity that the validator can process correctly within time constraints.
-
Achieved FLOPS are calculated as:
-
This metric is used to assess validator performance and reputation.
Importance
- Measures actual model inference speed and correctness.
- Aligns compute validation with meaningful AI workloads.
- Provides objective, reproducible performance metrics.