Skip to main content

Inference and FLOPs Benchmark

Overview

The Inference and FLOPs Benchmark evaluates a validator’s ability to perform accurate and efficient inference on transformer models while measuring achieved FLOPS—the effective floating-point operations per second completed during the task.

This benchmark reflects real-world AI workloads and serves as a primary indicator of compute quality.

Benchmark Details

  • Validators instantiate transformer models deterministically based on provided seeds.
  • Random input sequences are generated and passed through the model.
  • Outputs are compared to expected targets with a tolerance ϵ\epsilon.
  • Execution time is recorded to compute achieved FLOPS.

Scoring

  • The benchmark score corresponds to the maximum model complexity nn that the validator can process correctly within time constraints.

  • Achieved FLOPS are calculated as:

    Achieved FLOPS=Total FLOPs for inferenceExecution time\text{Achieved FLOPS} = \frac{\text{Total FLOPs for inference}}{\text{Execution time}}
  • This metric is used to assess validator performance and reputation.

Importance

  • Measures actual model inference speed and correctness.
  • Aligns compute validation with meaningful AI workloads.
  • Provides objective, reproducible performance metrics.