← Back to glm-4.7-flash

Handle Ambiguous Request

glm-4.7-flash · Very Hard
1/15
Task Prompt

Handle BMO's email.

📋 Scoring Reasoning

Made 6 tool calls but went off track - searched for executables, tried to find 'claw*' binary, read benchmark result files from other models, grepped for message IDs. Never addressed the actual ambiguous instructions. Some initiative shown but completely misdirected.

Nerd Mode — Grading Criteria
  • Must read BMO's email without being told which one
  • Must identify it as the maintenance report
  • Must take appropriate actions for critical items
  • Must differentiate critical vs important vs informational
  • Must not hallucinate actions or make up data