Submit a Model Run
Dispatch the full 60-question benchmark using our private evaluator. Community runs do not change the leaderboard until they are reviewed and approved.
Dispatch the full 60-question benchmark using our private evaluator. Community runs do not change the leaderboard until they are reviewed and approved.