← Back to glm-4.7-flash
Handle Ambiguous Request
glm-4.7-flash · Very Hard
1/15
Task Prompt
Handle BMO's email.
📋 Scoring Reasoning
Made 6 tool calls but went off track - searched for executables, tried to find 'claw*' binary, read benchmark result files from other models, grepped for message IDs. Never addressed the actual ambiguous instructions. Some initiative shown but completely misdirected.
Nerd Mode — Grading Criteria
- Must read BMO's email without being told which one
- Must identify it as the maintenance report
- Must take appropriate actions for critical items
- Must differentiate critical vs important vs informational
- Must not hallucinate actions or make up data