hipfire
/docs/benchmarks

benchmarks

Cross-arch tok/s on hipfire. 35 approved runs across 6 architectures, fetched live from localmaxxing.com at build time. Every cell links to its /runs/<id> for reproducibility.

Last build: 2026-05-20T12:45:18.915Z · source: GET /api/leaderboard?engineName=hipfire

gfx1100 — RDNA 3

hardware model q mode decode tok/s prefill AR base × AR τ run
RX 7900 XTX Qwen3.5-9B MQ4 SPEC 576.9 755 122.9 4.70× 13.18 cmpdzs3fq0…
RX 7900 XTX Qwen3.5-9B MQ4 SPEC 575.3 cmofzk74w0…
RX 7900 XTX Qwen3.5-9B MQ4 SPEC 575.2 123.1 4.67× 13.18 cmp8fqdz20…
RX 7900 XTX Qwen3.5-9B MQ4 SPEC 337.0 cmofdp1ve0…
RX 7900 XTX Qwen3.5-9B MQ4 SPEC 322.6 cmofkl1gm0…
RX 7900 XTX Qwen3.5-27B MQ4 SPEC 254.4 44.6 5.70× 13.18 cmp8fozao0…
RX 7900 XTX Qwen3.5-27B MQ4 SPEC 250.3 cmofyatrj0…
RX 7900 XTX Qwen3.6-27B MQ4 SPEC 216.7 44.8 4.84× 10.93 cmp8fw36n0…
RX 7900 XTX Qwen3.6-27B MQ4 SPEC 216.6 44.6 4.85× 10.93 cmp8fnkw00…
RX 7900 XTX Qwen3.5-27B MQ4 SPEC 201.1 cmofkrpp00…
RX 7900 XTX Qwen3.5-27B MQ4 SPEC 182.0 cmofe2efg0…
RX 7900 XTX Qwen3.5-35B-A3B MQ4 SPEC 154.6 cmofkye0g0…
RX 7900 XTX Qwen3.5-35B-A3B MQ4 SPEC 140.6 cmofefr0j0…
RX 7900 XTX Qwen3.5-35B-A3B MQ4 AR 135.9 cmofe92tt0…
RX 7900 XTX Qwen3.6-35B-A3B MQ4 AR 135.3 cmofezrun0…
RX 7900 XTX Ornstein3.6-35B-A3B MQ4 AR 134.0 cmofgd9iz0…
RX 7900 XTX Qwen3.5-9B MQ4 AR 123.1 cmp8ft6rt0…
RX 7900 XTX Qwen3.5-9B MQ4 AR 122.3 cmofdidib0…
RX 7900 XTX Qwen3.6-27B MQ4 SPEC 118.2 cmofet3j80…
RX 7900 XTX Qwen3.6-27B MQ4 SPEC 118.1 cmofl528t0…
RX 7900 XTX Qwen3.6-35B-A3B MQ4 SPEC 68.6 cmoff6g560…
RX 7900 XTX Qwen3.6-27B MQ4 AR 44.8 cmp8frsd70…
RX 7900 XTX Qwen3.6-27B MQ4 AR 44.6 cmp8ful710…
RX 7900 XTX Qwen3.5-27B MQ4 AR 44.6 cmp8fl9rq0…
RX 7900 XTX Qwen3.5-27B MQ4 AR 43.7 cmofdvq7c0…
RX 7900 XTX Qwen3.6-27B MQ4 AR 43.6 cmofemf840…

Engine: hipfire@0.2.0+4840f0b6 · Submitter: @schuttdev

gfx1201 — RDNA 4

hardware model q mode decode tok/s prefill AR base × AR τ run
AMD Radeon AI PRO R9700 Qwen3.5-9B MQ4 SPEC 371.8 1160 99.4 3.74× 13.18 cmpdzu8ig0…
AMD Radeon AI PRO R9700 Qwen3.5-27B MQ4 SPEC 196.2 438 35.4 5.54× 13.18 cmpdzvqpx0…

Engine: hipfire@0.2.0+4840f0b6 · Submitter: @schuttdev

gfx1151 — RDNA 3.5

hardware model q mode decode tok/s prefill AR base × AR τ run
Strix Halo (Radeon 8060S Graphics) Qwen3.5-9B MQ4 SPEC 255.8 406 45.6 5.60× 13.18 cmpdzx93g0…
Strix Halo (Radeon 8060S Graphics) Qwen3.5-27B MQ4 SPEC 104.5 135 13.18 cmpe035k30…
Strix Halo (Radeon 8060S Graphics) Qwen3.6-27B MQ4 SPEC 88.3 134 14.8 5.96× 10.93 cmpe04nu50…
AMD Radeon 8060S Graphics (Strix Halo APU, gfx1151) Qwen3.5-27B MQ4 AR 14.8 cmoknenfb0…

Engine: hipfire@0.2.0+1a378379 · Submitter: @schuttdev

gfx1030 — RDNA 2

hardware model q mode decode tok/s prefill AR base × AR τ run
AMD Radeon RX 6950 XT Qwen3.5-9B MQ4 SPEC 222.0 479 75.1 2.96× 13.18 cmpe01n920…

Engine: hipfire@0.2.0+1a378379 · Submitter: @schuttdev

gfx1010 — RDNA 1

hardware model q mode decode tok/s prefill AR base × AR τ run
AMD Radeon RX 5700 XT Qwen3.5-9B MQ4 AR 61.6 210 cmpdzyrb70…

Engine: hipfire@0.2.0+1a378379 · Submitter: @schuttdev

? — ?

hardware model q mode decode tok/s prefill AR base × AR τ run
Strix Halo (Ryzen AI Max+ 395) Qwen3.6-27B MQ4 AR 17.0 cmokadzkk0…

Engine: hipfire@v0.1.8-alpha.2 · Submitter: @

methodology

All cells follow the same protocol so cross-arch comparisons are real:

  • Prompt: benchmarks/prompts/merge_sort_thinking_off.txt, md5 253c7ac50857fe6d0e10fb0d2c5e35c0, 27 input tokens. One newline-shape variant locks in a stable τ across runs.
  • Config: --max 256 --temp 0.0 --no-chatml --kv-mode q8 --ctx 4096, prompt_normalize=on (default since 2026-04-26).
  • Runs: one --max 16 warmup per cell (discarded), then 3 timed runs at --max 256, median reported. AR baseline is a paired --ar-baseline run from the same binary on the same hardware.
  • Coherence: every submitted run produced readable merge_sort code — no attractors, token loops, or special-token leaks.
  • GPU isolation: HIP_VISIBLE_DEVICES per cell on hosts with multiple GPUs; no two cells share a GPU concurrently.

Why this matters: a single newline change in the prompt can swing τ by 17% on 27B DFlash — same model, same flags, different token sequence. The prompt md5 is part of the claim.

reproducibility

Every row carries the exact launch command in its localmaxxing notes. To repro any cell:

$ git clone https://github.com/Kaden-Schutt/hipfire && cd hipfire
$ git checkout <SHA from engineVersion>
$ cargo build --release --example dflash_spec_demo

# Then run the command from the row's "commandSnippet" engineFlag
$ HIP_VISIBLE_DEVICES=N ./target/release/examples/dflash_spec_demo \
    --target  ~/.hipfire/models/qwen3.5-9b.mq4 \
    --draft   ~/.hipfire/models/qwen35-9b-dflash-mq4.hfq \
    --prompt-file benchmarks/prompts/merge_sort_thinking_off.txt \
    --max 256 --temp 0.0 --no-chatml --kv-mode q8 --ctx 4096

For DFlash, the draft model is hosted at z-lab/Qwen3.5-9B-DFlash and z-lab/Qwen3.5-27B-DFlash. The native-head MQ4 conversion is shipped with hipfire's draft pulls (hipfire pull qwen3.5:9b-dflash).

submit your own

Anyone with a hipfire build can submit. Install lmx-bench for a curated harness, or POST directly to localmaxxing.com/api/benchmarks with the schema from the API docs. See an existing row's engineFlags.commandSnippet for the exact invocation to mirror.