Adjudicate (coming soon)

Pre-banked disagreements

Each question is run through three frontier thinking models. An LLM judge scores disagreement on management, safety, factual, and coverage axes. Only the hardest divergences are surfaced.

You decide

Pick which model's answer is strongest, rate each on a 1–5 scale, and surface the gaps the others missed. Your judgment becomes the labeled ground truth.

Write the rubric

List 2+ requirements a correct answer must include and 1+ negative scoring items that disqualify a wrong one. Add sources and a short reasoning trace. That's the bounty.

Pick the right answer when frontier models disagree.

Pre-banked disagreements

You decide

Write the rubric