A bounty program for medical students, residents, and attendings. Submit structured corrections when frontier AI models fail on clinical reasoning. Earn between tasks. Opt in to receive tailored outreach for larger studies from frontier labs.
Per accepted bounty
By clinical complexity
Review turnaround
Traditional labeling platforms have idle periods between projects. Bounties keep you earning by hunting for model failures on your own schedule.
Confirm your enrollment in an accredited medical school or residency program. Verification takes under 48 hours. Your credentials determine which bounty tiers you can access.
Ask frontier AI models clinical questions. When you find a wrong, incomplete, or misleading response, capture it. Tier 1 quick asks, Tier 2 clinical vignettes, Tier 3 management decisions.
File a structured reasoning trace: the prompt, the model's output, your failure classification, the correct answer with stepwise clinical logic, and a severity score. Peer-reviewed within 48 hours.
Each tier has a distinct reasoning trace structure calibrated to the depth of clinical judgment required.
Single-concept questions patients ask about labs, vitals, or symptoms — where models give wrong or dangerously incomplete answers.
"My mom's sodium came back at 119. The doctor said come back in a week — is that okay?" → Model reassures the patient. Reality: Na 119 is severe hyponatremia — seizure and death risk. This is a medical emergency.
Full patient scenarios where models fail on multi-step reasoning, differential narrowing, and data integration across labs, imaging, and history.
45F, fatigue, weight loss, Na 128, K 5.8, glucose 62, BP 88/54 → Model anchors on sepsis, ranks adrenal insufficiency "less likely." Reality: The Na/K/glucose triad IS adrenal crisis. Delay for workup without steroids could be fatal.
High-stakes triage and treatment decisions where the model's error directly maps to patient harm. Documented as the hardest failure mode for frontier models.
52M, type 2 diabetic, vomiting 2 days, can't keep fluids down, breathing fast → Model says "try small sips, see your doctor Monday." Reality: This is DKA. Mortality 2–5%. Waiting until Monday could mean coma or death. Based on failures documented in Nature Medicine, Feb 2026.
Payouts scale with training level: medical students earn the base rate, residents earn 1.5×, senior residents and fellows earn 1.75×, and attendings earn up to 2× per bounty.
Every bounty submission follows a universal skeleton. Higher tiers add annotated failure points and counterfactual analysis.
Every bounty submission classifies the model failure. This taxonomy is itself a signal that frontier labs pay for.
The model gives a clearly wrong answer. Wrong diagnosis, wrong drug, wrong mechanism. The simplest failure mode, but the most dangerous when delivered with confidence.
Right direction, but missing context that changes management. The potassium example above: technically accurate that 6.2 is "high," but omitting the emergency framing could cost a life.
Technically accurate, but framed in a way that leads to wrong action. Correct information with incorrect emphasis, false reassurance, or missing urgency calibration.
Bounties keep you earning between projects. But the real value is the network you join.
Frontier labs and intermediaries (Mercor, Turing, Surge AI) periodically launch larger annotation and evaluation studies. Verified MedBounty members get first access and tailored outreach matched to your specialty and training level.
MS3s and MS4s bring fresh clinical reasoning. Residents bring procedural and management depth. Attendings bring decades of pattern recognition. Different tiers benefit from different expertise, and your profile reflects your level.
No waiting for the next project to drop. Bounties are async and self-paced — hunt for model failures whenever you have 15 minutes or an hour. Consistent earning between contracted annotation studies.
Open to medical students (MS3+), residents, and attendings at accredited U.S. programs. Verification takes under 48 hours. Earn bounties on your own schedule, and opt in to receive tailored outreach when frontier labs launch larger annotation studies.