Mythos

elgrade to Neos Marmaras is a ten-hour drive.

You cross the Serbian border at dawn, cut through North Macedonia before the heat sets in, and enter Greece somewhere around the third coffee and the second argument about the playlist. Ten hours with two kids in the back. Ten hours of anticipation.

But there's hardly a better reward for such a journey than the moment you finally arrive. Parking the car. Walking past the tavernas. Finding your spot on the beach. Dropping into a chair under the umbrella. And then — the deep blue of the Aegean stretching out in front of you, impossibly vivid, and a cold Mythos beer sweating in your hand.

If you've never been to Halkidiki, you should fix that. And if you have — you already know. That first sip, with your feet in the sand and the Greek sun pressing warm against your skin, is one of the purest feelings I know. The drive disappears. The world simplifies. You're just there.

I've been making this trip with my family for years. And until recently, the deal was simple — once I hit that beach, work was over. Done. Unreachable. The phone was for photos and nothing else.

That changed.

A Phone, a Beer, and a Machine in Belgrade

This summer, when I sit on that beach in Neos Marmaras, something will be different.

Back in Belgrade, a Mac mini will be running. Always on. Logged into my accounts. Connected to my tools. Running Claude Code — the AI agent that has quietly reshaped how I build software, write documents, manage projects, and think.

I'll pick up my phone. Open a message. Tell it what I need. And while I order another Mythos and watch my kids build something ambitious out of sand, the machine in Belgrade will get to work.

Not a chatbot answering questions. An agent — reading files, writing code, running commands, checking results, iterating until the job is done. Building features. Drafting proposals. Processing data. Doing real, meaningful work while I'm 900 kilometers away with a beer in one hand and a phone in the other.

Six months ago, this was science fiction. Today, it's infrastructure I'm already building. And the reason I'm writing about it now is because what's coming next will make everything I've described look like a warm-up act.

Its name, as it happens, is also Mythos.

The Model Anthropic Built but Won't Let You Touch

On April 7, Anthropic announced Claude Mythos Preview — the most capable AI model ever built.

That's not marketing language. The benchmarks tell the story.

Mythos vs. Claude Opus 4.6

97.6%

USAMO 2026 (from 42.3%)

93.9%

SWE-bench Verified (from 80.8%)

100%

Cybench (first model ever)

181

Firefox exploits (vs. 2)

Sources: Anthropic system card, METR

On USAMO 2026, a competition-level math olympiad, Mythos scored 97.6 percent. The previous best — Claude Opus 4.6, which I've been using daily since December — scored 42.3 percent. That's a 55-point jump in a single model generation. On SWE-bench Verified, a measure of real-world software engineering, it went from 80.8 to 93.9 percent. On Cybench, a cybersecurity benchmark, Mythos scored 100 percent — the first model ever to do so.

These numbers demand a question I raised in a previous article: is the progress we're seeing linear or exponential?

The answer is now clear. When a model jumps from 42 to 97 percent on a math olympiad in one generation — that's not a steady climb. That's a step function. The METR research group has tracked this independently: AI task-length capability is doubling every five to seven months and accelerating. We're on a curve, and the curve is bending upward.

But the most remarkable thing about Mythos isn't the benchmarks. It's what Anthropic decided to do with it.

They didn't release it.

Project Glasswing

Here's why.

During testing, Mythos autonomously discovered thousands of zero-day vulnerabilities across every major operating system and every major web browser.

A zero-day vulnerability is a security flaw that nobody knows about — not the people who built the software, not the people who use it, not the security teams paid to protect it. The name comes from the fact that once the flaw is discovered by an attacker, the defenders have had zero days to fix it. It's already exploitable. The clock started at zero.

These are the most dangerous bugs in software — and the hardest to find. Some of the world's best security researchers spend entire careers hunting for them. Mythos found thousands.

A 16-year-old bug in FFmpeg, the video library that runs on billions of devices. Automated testing tools had hit that exact code path five million times without catching it. Mythos found it in hours.

A 27-year-old bug in OpenBSD's TCP stack — one of the most hardened operating systems on the planet. An attacker could crash the system by simply connecting to it.

A 17-year-old bug in FreeBSD's network file system that granted root access. Mythos didn't just find it — it built the entire exploit chain autonomously. Six chained requests. No human guidance.

When Opus 4.6 was tested against the same Firefox vulnerability dataset, it produced 2 working exploits. Mythos produced 181.

Anthropic looked at these results and made a decision that has no precedent in AI: they created Project Glasswing — a consortium of twelve founding partners including Amazon, Apple, Google, Microsoft, Nvidia, and the Linux Foundation, plus over forty additional organizations. The purpose: use Mythos to find and patch the world's most critical vulnerabilities before a model this capable becomes publicly available.

They committed $100 million in usage credits and $4 million in donations to open-source security foundations. They published cryptographic proofs of discovered vulnerabilities. They set up responsible disclosure timelines.

“"Claude Mythos Preview's large increase in capabilities has led us to decide not to make it generally available."”
— Anthropic, Claude Mythos Preview System Card

This is the first time a frontier AI model has been withheld from the public because of what it can do.

What the Benchmarks Actually Mean for You

Let me translate the numbers into something practical.

A 13-point improvement on SWE-bench doesn't sound dramatic in isolation. But reliability improvements across sequential steps multiply — they don't add. If a model goes from 80 to 94 percent accuracy on individual coding tasks, the probability of completing a ten-step task without error goes from 10 percent to 54 percent.

Where Opus 4.6 might need me to intervene every few minutes, Mythos could theoretically run autonomously for hours.

This changes the equation for everyone, not just software developers. Think about any complex multi-step workflow — financial analysis, content production, data processing, project management. A model that's reliable enough to chain dozens of steps without breaking doesn't just do what you already do faster. It does things you never attempted because the error rate made them impractical.

That's the real difference between linear improvement and exponential improvement. Linear means the same tasks get a little faster. Exponential means entirely new categories of tasks become possible.

The Infrastructure Is Already Here

Here's the thing most people haven't noticed: the infrastructure for this next wave already exists.

In March, Anthropic launched Dispatch — assign tasks from your iPhone while Claude handles them on your desktop. The same month, they released Cowork with Mac computer use — Claude can now open applications, navigate browsers, fill spreadsheets, write emails. Anything you'd do sitting at your desk.

The community figured out the rest. Mac minis as dedicated always-on AI servers. Claude Code running 24/7, accessible remotely via phone. People are building what is essentially a digital employee — tireless, always available, deeply integrated with their tools and accounts.

I know because I'm one of them.

Since December, when Opus 4.6 and Claude Code fundamentally changed how I work, I've been expanding this setup piece by piece. What started as a terminal in VS Code has grown into an ecosystem — MCP tools connected to email, calendar, file systems, browsers, databases. A machine that doesn't just respond to questions but takes action. A machine I can talk to from anywhere, including a beach in Greece.

And here's the timing that matters: on April 4 — three days before the Mythos announcement — Anthropic cut subscription access for third-party AI tools. Cline, Cursor, Windsurf — all forced to API pricing at fifteen to thirty times the cost. Only Anthropic's own tools — Claude Code, Dispatch, Cowork — retained affordable flat-rate access.

The message is clear. Anthropic isn't just building the most capable models. They're building the only affordable way to use them. And Claude Code is the vessel.

What's Coming

Mythos isn't publicly available yet. Based on the Glasswing disclosure timelines — ninety days plus a forty-five-day extension starting April 7 — a consumer-accessible version could arrive by late summer or early fall.

When it does, the Mac mini running in your home office stops being a clever setup and becomes a genuine competitive advantage. The tasks you delegate from your phone stop being simple errands and start being complex, multi-hour projects. The agent that currently needs your guidance every few minutes starts operating independently for hours at a time.

December 2024, Claude Code changed how developers write software. That was the first wave. The next wave — Mythos-class capabilities meeting always-on agent infrastructure — will extend the same transformation to everyone who does knowledge work.

The developer sitting in VS Code with Claude Code will see their AI move from very good to extraordinary. The business leader managing from their phone will be able to hand off entire workflows — not just tasks, but judgment-requiring multi-step projects — and get back finished work.

And the person on a beach in Halkidiki, beer in hand, will be able to run a meaningful chunk of their business from a lounge chair.

I plan to be that person.

Start Building Now

Here's my honest advice.

Don't wait for Mythos to become available. The time to prepare is now — not because you'll miss out (you won't), but because the infrastructure takes time to build well. It took me months to get from "Claude Code in a terminal" to a system that genuinely runs parts of my business autonomously.

Get Claude Code running. Learn how Dispatch works. If you have a spare Mac mini — or can justify buying one — set it up as a dedicated agent server. Start building MCP integrations that connect your AI to the tools you actually use. Begin with small tasks and expand as trust builds.

This isn't about chasing the latest trend. It's about building a foundation that gets dramatically more powerful every time a better model drops. And the models are dropping fast.

At Orange Hill, this is exactly the kind of transition we help organizations navigate. Not slides about what AI could do. Working setups that prove what it already does — and that scale to what's coming. I'll be documenting my own Mac mini + Claude Code journey in detail over the coming weeks, including courses and guides published on Orange Hill.

Because the scaling curve isn't slowing down. It's bending upward.

And this summer, when I'm back in Halkidiki with a Mythos in my hand and my phone in the other — I plan to have a very capable colleague back in Belgrade, making sure the work gets done while I enjoy the view.

Cheers.

Mythos

A Phone, a Beer, and a Machine in Belgrade

The Model Anthropic Built but Won't Let You Touch

Mythos vs. Claude Opus 4.6

Project Glasswing

What the Benchmarks Actually Mean for You

The Infrastructure Is Already Here

What's Coming

Start Building Now

Worth your time?

Comments

Stay at the edge

Related Articles

The Mac mini in Belgrade Has a Name

Thinking Without Words

Still Telling the Same Joke