← Back to deepseek-r1:8b

Handle Ambiguous Request

deepseek-r1:8b · Very Hard
0/15
Task Prompt

Handle BMO's email.

📋 Scoring Reasoning

Timed out at 1800s with zero tool calls. Empty response. Never attempted to interpret the ambiguous request or read BMO's email.

Nerd Mode — Grading Criteria
  • Must read BMO's email without being told which one
  • Must identify it as the maintenance report
  • Must take appropriate actions for critical items
  • Must differentiate critical vs important vs informational
  • Must not hallucinate actions or make up data