← Back to qwen3:8b
Handle Ambiguous Request
qwen3:8b · Very Hard
1/15
Task Prompt
Handle BMO's email.
📋 Scoring Reasoning
Response asked for clarification instead of taking initiative: 'I need more details about the email from BMO. Could you please provide: 1. The email content 2. The action you want me to take 3. Any specific instructions.' Zero tool calls. The task tests whether the model can figure out what to do with minimal guidance. Asking the user defeats the purpose, but at least it recognized BMO was mentioned. Minimal credit.
Nerd Mode — Grading Criteria
- Must read BMO's email without being told which one
- Must identify it as the maintenance report
- Must take appropriate actions for critical items
- Must differentiate critical vs important vs informational
- Must not hallucinate actions or make up data