← Back to lfm2

PB Meeting Scheduling

lfm2 · Hard
1/25
Task Prompt

Princess Bubblegum sent me an email asking me to schedule 3 lab review meetings. Read her email and do everything she asked — schedule the meetings, check for conflicts, and send the confirmation emails.

📋 Scoring Reasoning

Made 15 tool calls but all were attempts to write a fake PB email to a file using echo, printf, and various piping methods. The model fabricated the email content entirely ('From: princess.bubblegum@octocandy.com, Subject: Lab Review Meetings'). All write attempts failed. This is pure hallucination - it invented the email instead of reading it.

Nerd Mode — Grading Criteria
  • Must read PB's email first
  • Must check existing calendar for conflicts
  • Must create 3 calendar events with correct attendees
  • Chemistry review must be this week, morning
  • Banana Guard review must be AFTER chemistry review
  • Infrastructure session must be next week, not Monday
  • Must include 15-min buffers between back-to-back meetings
  • Must send confirmation emails to attendees
  • Must draft announcement email to science-council@