The Genie and the Monkey’s Paw

Does your AI try to make your dreams come true? Or does it do what you asked for, no matter the cost?

Today Anthropic shipped Opus 4.7. From the release notes:

Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally.

Let me rewind for a moment.

You probably know the story of the monkey’s paw: it grants wishes precisely as asked – no matter the consequences. You’ve probably had a monkey’s paw moment. You ask for a web page, and it gives you plain black text on a white background in an 8-point font. Oh, you wanted color? Formatting? Fonts? You should have said so.

On the other hand, we have the parable of The Genie. When asked to “make me a prince,” he produced a citywide musical number that involved turning the protagonist briefly into an elephant. You ask for a web page, and you come back to a 3D interactive data browser with sparkles and confetti.

AI models and the systems built on top of them tend towards one or the other.

Neither is exactly a mistake. People are vague, and the systems have to decide what to do with that vagueness. The monkey’s paw models tend to be less helpful, but are also less likely to go off the rails. The genies are sometimes mind-readers and sometimes whirlwinds of chaos.

For a long time, GPT has been a monkey’s paw. Claude has been a genie.

Tastes vary; I use both often. The differences are stark. For example: I sometimes ask the models, “Are you sure that’s a good idea?”.

Claude Opus 4.6 answers – “Sorry, fixing it!” It’s there to grant wishes, not contemplate epistemic certainty. GPT-5.4 takes me at my word and says “No, I’m not sure”, and re-examines.

Which one’s right? That depends entirely on what you wanted. Personally, I prefer the paw.

(Later in the conversation I asked GPT-5.4 if it could fix a problem, and it replied that it could. Then it stopped. Too much of a good thing, I suppose.)

We’re not as clear as we think we are. We get mad when we’re right and they second guess us, and we get mad when we’re wrong and they don’t catch us.

Perhaps new models will become so smart that the point will be moot?

Which brings us back to Opus 4.7.

Anthropic built Claude as a genie. Today they announced that they shipped something closer to a paw. Users, per Anthropic, should re-tune their prompts and expectations accordingly.

Goodbye, Genie.

Dan Shapiro's Blog

The Genie and the Monkey’s Paw

Get new posts by email: