Optional live Triad debate
This is the live inference surface. Counsel, Risk, and Evidence critique an input in parallel on one Qwen 2.5 72B model running behind the AMD MI300X endpoint. Judges should start with the 90-second judge demo first; use this page only when you want to see the GPU-backed run.
?→?checking…
Judge path: this page is optional live inference. For the guaranteed no-wait review, use the 90-second seeded demo.
Ensemble shape · 3 voices · 1 GPU
GPU idle
vLLM /v1/*
MI300X192 GB HBM3
One Qwen 2.5 72B endpoint on a single AMD Instinct MI300X (192 GB HBM3) serves the entire ensemble. The same workload on cloud APIs needs ~4× H100s — large-memory GPU serving is what makes parallel ensembles economical.
Use case:
Use when: You're reviewing a legal/regulatory document and want to catch what one reviewer would miss.