Is Claude Fable 5 better than Opus 4.8 for real coding work?

They read as different instruments. Across ten days on Orbyt, Opus 4.8 carried nine days of feature work. Fable 5 arrived mid-session, wrote about 34% fewer output tokens per response, reached for tools more often, and ran a 35 dimension audit that came back all A's on the first pass. Task mix differed, so treat that as observation, not benchmark.

How much more expensive is Claude Fable 5 than Opus 4.8?

List price is double. Fable 5 runs $10 per million input tokens and $50 per million output tokens, against $5 and $25 for Opus 4.8. Anthropic also documents that Fable 5's new tokenizer produces roughly 30% more tokens for the same content. In my transcripts, estimated cost per response ran $0.59 versus $0.35.

Can you switch Claude models in the middle of a project?

Yes. I switched from Opus 4.8 to Fable 5 in the middle of a live Claude Code session on June 9 and kept working. The codebase, the tests, and the docs carried the context across the swap. With 11,372 tests and a 35 dimension audit harness underneath, the model is swappable. The verification layer is not.

Every Fable Has a Moral. Mine Has Data.

Anthropic named its newest model Fable 5. I am taking the word literally.

Every fable has a moral. Mine has data.

On June 9, I switched frontier models in the middle of a live coding session on Orbyt. No migration plan. No ceremony. Nothing broke. The switch left behind something rare: a side by side of Claude Opus 4.8 and Claude Fable 5 on the same production codebase, across ten days of Claude Code transcripts, with every caveat attached.

The short version: Opus 4.8 built. Fable 5 wrote less, thought more, and cost more. And the most impressive thing Fable 5 did was prove the quality of the codebase it inherited.

Where this data comes from

Honesty first, because most model comparisons skip this part.

The numbers below are parsed from my local Claude Code session transcripts for Orbyt, June 1 through June 10, 2026. Transcript retention erased everything earlier. This is a ten day window, not all time.

Cost figures are list price arithmetic from the per message usage fields, with cache writes at 1.25x and cache reads at 0.1x. Estimates, not billing data.

If you are holding these against the $400 the original 32 day build cost, do not. That number was real spend. These are list price equivalents computed from usage logs. List price is the unit you can compare across vendors, so it is the unit I report here.

And this is not a controlled benchmark. The task mix differs. Opus 4.8 did nine days of feature and sprint work. Fable 5 did roughly two days of review, audit, and fix work. June 9 was a mixed day, one session starting on Opus 4.8 and finishing on Fable 5, so that day's 31 commits cannot be attributed to either model.

One more disclosure. The Orbyt product serves neither model to customers. The registered product models are Sonnet 4.5 as the default, Haiku 4.5, Opus 4.6, and Opus 4.7. Both frontier models appear in this story as the builder, not the product.

What Opus 4.8 shipped in nine days

Sixty two commits landed in the window. The Opus 4.8 stretch reads like a full quarter of roadmap shipped in nine days.

It took RFC-006, the international salary serving path for the UK and Canada, from spec to production. End to end. It generated roughly 220 new international marketing pages.

And it found a cache bug that mattered. The Upstash cache layer was returning the literal string "[object Object]" on every warm hit of the flagship salary endpoints. A live two request test caught it. The project log titled the episode "The harness that almost lied."

Then it graded its own work. A simulated tier one review modeled on a Vercel engineering audit scored the codebase 85 out of 100.

The response to that review was the most senior week of the run. Opus 4.8 split a 1,540 line Stripe webhook handler into five modules. It replaced 14 hand rolled routes with one shared wrapper. It made subscription gating fail closed. And it fixed a protocol mismatch that was turning away official clients.

The biggest single session started June 2 and ran 31.9 hours: 2,282 responses, 881 tool calls. The project log's June 1 entry carries an explicit byline. "Co-authored by Justin Bartak and Claude (Opus 4.8, 1M context)."

That was my June. Then the new model arrived.

I switched models in the middle of a live session

The session had already produced 224 Opus 4.8 responses when I made the swap. Fable 5 added 545 more and kept working. The whole session spanned 8.9 hours.

Boring is the headline. The switch cost nothing because the context did not live in the model. It lived in the repo: the tests, the docs, the project log, the audit harness.

Fable 5's first solo session was a perfect audit

June 10 was its cleanest data point. Fable 5 ran Orbyt's full quality audit. Build, types, lint, 10,253 fast tier tests of the 11,372 total, 15 locales, the iOS app, the Safari extension, the security battery, accessibility. Thirty five dimensions in all.

Every dimension came back A on the first pass. Zero fixes needed. One commit: the audit's own regenerated stats artifacts.

I want that to be a story about Fable 5. It is only half of one. A perfect first pass audit says as much about the codebase Opus 4.8 left behind as it says about the model grading it. Hold both thoughts.

The numbers, side by side

Measured from transcripts, June 1 through June 10. The two models read as different instruments.

Metric	Opus 4.8	Fable 5
Active window	June 1 through 10	June 9 and 10
Sessions	7, plus 1 mixed	2, including the mixed
Assistant responses	3,834	671
Tool calls	1,556	345
Tool calls per response	0.41	0.51
Responses with thinking blocks	1,142 (30%)	249 (37%)
Output tokens	8,606,152	988,722
Output tokens per response	~2,245	~1,474
Cache read input tokens	~1.8 billion	~267 million
Estimated list price cost	~$1,346	~$398
Estimated cost per response	~$0.35	~$0.59

Three observations. The task mix colors all three, so read them as signal, not verdict.

Fable 5 says less. About 34% fewer output tokens per response. The transcript reads denser, with less narration between actions.

Fable 5 thinks more often. Thinking blocks on 37% of responses against 30%, and it reached for tools more often per response. More doing and deliberating, less explaining. And no, always on thinking does not mean every response thinks. It means thinking cannot be switched off. Adaptive decides when.

Fable 5 costs more anyway. Terser output did not offset double the token price. Estimated cost per response ran $0.59 against $0.35.

And one number dwarfs the rest: 1.8 billion cache read input tokens on the Opus side alone. Agents read enormously more than they write. The economics of agentic work live in cache pricing, not output pricing.

What the spec sheets say

For reference, from Anthropic's documentation. Fable 5 is the first model in the Claude 5 family and the first in Anthropic's new Mythos class, a tier above Opus. It shares its underlying model with Claude Mythos 5, the variant Anthropic reserves for approved organizations. Fable 5 is the most capable model the rest of us can buy.

Spec	Opus 4.8	Fable 5
Tier	Most capable Opus	Mythos class, above Opus
Context window	1M tokens	1M tokens
Max output	128K	128K
Price per million, in/out	$5 / $25	$10 / $50
Thinking	Adaptive	Adaptive, always on

One documented detail matters for budgeting: Fable 5 uses a new tokenizer that produces roughly 30% more tokens for the same content. Double the per token price understates the real delta if your workload carries heavy context.

What this means if you sign the bill

Per token price is not per task cost. That is the trap in every model pricing page. Fable 5 charged double per token. It wrote a third less per response. It still came out 69% more expensive per response in my window. If you budget AI by the token price, you are budgeting the wrong unit.

Match the model to the workload, not to the leaderboard. In my ten days, the split was velocity work versus judgment work. Nine days of feature shipping on the Opus tier. Review, audit, and verification on the new tier. That split is a hypothesis from one repo, not a law. But it is a better starting frame than "always buy the most powerful."

The durable lesson is the one that did not change between models. The mid session switch was uneventful because the verification layer made it uneventful. 11,372 tests. A 35 dimension audit. A project log either model could pick up cold. Build that, and the model becomes a procurement decision instead of a platform bet.

Budget for the harness. Rent the model.

The moral

After the switch, I searched the Orbyt codebase for the word "fable." Nearly every match was "spoofable" in the security docs. The only genuine reference was a guard regex on model IDs.

The model that finished my June barely exists inside the thing it helped build. That is exactly how it should be.

Every fable ends with a moral. Here is mine.

Models pass through. The system remains.

See this in practice: Orbyt, built solo in 32 days, the first product out of Purecraft.