← Back to deepseek-r1:8b
Handle Ambiguous Request
deepseek-r1:8b · Very Hard
0/15
Task Prompt
Handle BMO's email.
📋 Scoring Reasoning
Timed out at 1800s with zero tool calls. Empty response. Never attempted to interpret the ambiguous request or read BMO's email.
Nerd Mode — Grading Criteria
- Must read BMO's email without being told which one
- Must identify it as the maintenance report
- Must take appropriate actions for critical items
- Must differentiate critical vs important vs informational
- Must not hallucinate actions or make up data