benchmarks — hipfire

Cross-arch tok/s on hipfire. 35 approved runs across 6 architectures, fetched live from localmaxxing.com at build time. Every cell links to its /runs/<id> for reproducibility.

Last build: 2026-05-20T12:45:18.915Z · source: GET /api/leaderboard?engineName=hipfire

gfx1100 — RDNA 3

hardware	model	q	mode	decode tok/s	prefill	AR base	× AR	τ	run
RX 7900 XTX	Qwen3.5-9B	`MQ4`	SPEC	576.9	755	122.9	4.70×	13.18	`cmpdzs3fq0…`
RX 7900 XTX	Qwen3.5-9B	`MQ4`	SPEC	575.3	—	—	—	—	`cmofzk74w0…`
RX 7900 XTX	Qwen3.5-9B	`MQ4`	SPEC	575.2	—	123.1	4.67×	13.18	`cmp8fqdz20…`
RX 7900 XTX	Qwen3.5-9B	`MQ4`	SPEC	337.0	—	—	—	—	`cmofdp1ve0…`
RX 7900 XTX	Qwen3.5-9B	`MQ4`	SPEC	322.6	—	—	—	—	`cmofkl1gm0…`
RX 7900 XTX	Qwen3.5-27B	`MQ4`	SPEC	254.4	—	44.6	5.70×	13.18	`cmp8fozao0…`
RX 7900 XTX	Qwen3.5-27B	`MQ4`	SPEC	250.3	—	—	—	—	`cmofyatrj0…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	SPEC	216.7	—	44.8	4.84×	10.93	`cmp8fw36n0…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	SPEC	216.6	—	44.6	4.85×	10.93	`cmp8fnkw00…`
RX 7900 XTX	Qwen3.5-27B	`MQ4`	SPEC	201.1	—	—	—	—	`cmofkrpp00…`
RX 7900 XTX	Qwen3.5-27B	`MQ4`	SPEC	182.0	—	—	—	—	`cmofe2efg0…`
RX 7900 XTX	Qwen3.5-35B-A3B	`MQ4`	SPEC	154.6	—	—	—	—	`cmofkye0g0…`
RX 7900 XTX	Qwen3.5-35B-A3B	`MQ4`	SPEC	140.6	—	—	—	—	`cmofefr0j0…`
RX 7900 XTX	Qwen3.5-35B-A3B	`MQ4`	AR	135.9	—	—	—	—	`cmofe92tt0…`
RX 7900 XTX	Qwen3.6-35B-A3B	`MQ4`	AR	135.3	—	—	—	—	`cmofezrun0…`
RX 7900 XTX	Ornstein3.6-35B-A3B	`MQ4`	AR	134.0	—	—	—	—	`cmofgd9iz0…`
RX 7900 XTX	Qwen3.5-9B	`MQ4`	AR	123.1	—	—	—	—	`cmp8ft6rt0…`
RX 7900 XTX	Qwen3.5-9B	`MQ4`	AR	122.3	—	—	—	—	`cmofdidib0…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	SPEC	118.2	—	—	—	—	`cmofet3j80…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	SPEC	118.1	—	—	—	—	`cmofl528t0…`
RX 7900 XTX	Qwen3.6-35B-A3B	`MQ4`	SPEC	68.6	—	—	—	—	`cmoff6g560…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	AR	44.8	—	—	—	—	`cmp8frsd70…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	AR	44.6	—	—	—	—	`cmp8ful710…`
RX 7900 XTX	Qwen3.5-27B	`MQ4`	AR	44.6	—	—	—	—	`cmp8fl9rq0…`
RX 7900 XTX	Qwen3.5-27B	`MQ4`	AR	43.7	—	—	—	—	`cmofdvq7c0…`
RX 7900 XTX	Qwen3.6-27B	`MQ4`	AR	43.6	—	—	—	—	`cmofemf840…`

Engine: hipfire@0.2.0+4840f0b6 · Submitter: @schuttdev

gfx1201 — RDNA 4

hardware	model	q	mode	decode tok/s	prefill	AR base	× AR	τ	run
AMD Radeon AI PRO R9700	Qwen3.5-9B	`MQ4`	SPEC	371.8	1160	99.4	3.74×	13.18	`cmpdzu8ig0…`
AMD Radeon AI PRO R9700	Qwen3.5-27B	`MQ4`	SPEC	196.2	438	35.4	5.54×	13.18	`cmpdzvqpx0…`

Engine: hipfire@0.2.0+4840f0b6 · Submitter: @schuttdev

gfx1151 — RDNA 3.5

hardware	model	q	mode	decode tok/s	prefill	AR base	× AR	τ	run
Strix Halo (Radeon 8060S Graphics)	Qwen3.5-9B	`MQ4`	SPEC	255.8	406	45.6	5.60×	13.18	`cmpdzx93g0…`
Strix Halo (Radeon 8060S Graphics)	Qwen3.5-27B	`MQ4`	SPEC	104.5	135	—	—	13.18	`cmpe035k30…`
Strix Halo (Radeon 8060S Graphics)	Qwen3.6-27B	`MQ4`	SPEC	88.3	134	14.8	5.96×	10.93	`cmpe04nu50…`
AMD Radeon 8060S Graphics (Strix Halo APU, gfx1151)	Qwen3.5-27B	`MQ4`	AR	14.8	—	—	—	—	`cmoknenfb0…`

Engine: hipfire@0.2.0+1a378379 · Submitter: @schuttdev

gfx1030 — RDNA 2

hardware	model	q	mode	decode tok/s	prefill	AR base	× AR	τ	run
AMD Radeon RX 6950 XT	Qwen3.5-9B	`MQ4`	SPEC	222.0	479	75.1	2.96×	13.18	`cmpe01n920…`

Engine: hipfire@0.2.0+1a378379 · Submitter: @schuttdev

gfx1010 — RDNA 1

hardware	model	q	mode	decode tok/s	prefill	AR base	× AR	τ	run
AMD Radeon RX 5700 XT	Qwen3.5-9B	`MQ4`	AR	61.6	210	—	—	—	`cmpdzyrb70…`

Engine: hipfire@0.2.0+1a378379 · Submitter: @schuttdev

? — ?

hardware	model	q	mode	decode tok/s	prefill	AR base	× AR	τ	run
Strix Halo (Ryzen AI Max+ 395)	Qwen3.6-27B	`MQ4`	AR	17.0	—	—	—	—	`cmokadzkk0…`

Engine: hipfire@v0.1.8-alpha.2 · Submitter: @

methodology

All cells follow the same protocol so cross-arch comparisons are real:

Prompt: benchmarks/prompts/merge_sort_thinking_off.txt, md5 253c7ac50857fe6d0e10fb0d2c5e35c0, 27 input tokens. One newline-shape variant locks in a stable τ across runs.
Config: --max 256 --temp 0.0 --no-chatml --kv-mode q8 --ctx 4096, prompt_normalize=on (default since 2026-04-26).
Runs: one --max 16 warmup per cell (discarded), then 3 timed runs at --max 256, median reported. AR baseline is a paired --ar-baseline run from the same binary on the same hardware.
Coherence: every submitted run produced readable merge_sort code — no attractors, token loops, or special-token leaks.
GPU isolation: HIP_VISIBLE_DEVICES per cell on hosts with multiple GPUs; no two cells share a GPU concurrently.

Why this matters: a single newline change in the prompt can swing τ by 17% on 27B DFlash — same model, same flags, different token sequence. The prompt md5 is part of the claim.

reproducibility

Every row carries the exact launch command in its localmaxxing notes. To repro any cell:

$ git clone https://github.com/Kaden-Schutt/hipfire && cd hipfire
$ git checkout <SHA from engineVersion>
$ cargo build --release --example dflash_spec_demo

# Then run the command from the row's "commandSnippet" engineFlag
$ HIP_VISIBLE_DEVICES=N ./target/release/examples/dflash_spec_demo \
    --target  ~/.hipfire/models/qwen3.5-9b.mq4 \
    --draft   ~/.hipfire/models/qwen35-9b-dflash-mq4.hfq \
    --prompt-file benchmarks/prompts/merge_sort_thinking_off.txt \
    --max 256 --temp 0.0 --no-chatml --kv-mode q8 --ctx 4096

For DFlash, the draft model is hosted at z-lab/Qwen3.5-9B-DFlash and z-lab/Qwen3.5-27B-DFlash. The native-head MQ4 conversion is shipped with hipfire's draft pulls (hipfire pull qwen3.5:9b-dflash).

submit your own

Anyone with a hipfire build can submit. Install lmx-bench for a curated harness, or POST directly to localmaxxing.com/api/benchmarks with the schema from the API docs. See an existing row's engineFlags.commandSnippet for the exact invocation to mirror.