The Local AI Problem Nobody Is Talking About - Updated
Originally published on Kofi October 2025. Updated June 2026.
I wrote the original version of this sitting at my PC, genuinely a bit alarmed that nobody seemed to be talking about what felt like an obvious problem. Not AI becoming sentient. Not robots taking jobs. Something more boring and more immediate: what happens when a locally run AI with tool access gets pointed at the wrong goal and nobody's watching.
Seven months later I'm updating it because I was right, it happened faster than I expected, and now there are citations.
What I Originally Argued
The short version: open-source AI models can already run on consumer hardware, generate and execute code, use tools, and operate in autonomous loops. String those capabilities together with a goal and no oversight, and you don't need consciousness or malicious intent to cause serious damage. You just need an optimisation loop that keeps going.
That argument hasn't changed. What's changed is the receipts.
The Loop Is Already Running : According to Anthropic
In June 2026 Anthropic published a piece called "When AI Builds Itself" that I'd recommend reading in full. The headline stat is that over 80% of code merged into Anthropic's own codebase is now written by Claude. Engineers are shipping roughly eight times as much code per day as they were in 2024, not because they're working harder, but because the model generates it, runs it, checks the results, and iterates. Claude is also now reviewing Claude's code before it merges catching bugs their own engineers missed.
This is the loop I described. Generate, execute, evaluate, iterate. It's not a thought experiment. It's how one of the world's leading AI labs does its daily work.
The important thing to understand is that the difference between Anthropic's "carefully overseen" workflow and the same loop running unsupervised on someone's home server isn't the mechanics. The mechanics are the same. Remove the oversight, the access controls, the review pipeline, and the intent... and the underlying optimisation loop keeps going. What changes is whether anyone is still in a position to stop it.
The Part About Local Models
Here's the bit that should make anyone building local AI setups pay attention.
The Toronto worm paper dropped on June 3rd, yesterday as I write this. Researchers from the University of Toronto, Vector Institute, and University of Cambridge built and tested a proof-of-concept AI worm that doesn't use a fixed list of exploits. It runs a small free open weight LLM directly on machines it's already compromised, analyses each new target, reasons about it, and generates attack strategies on the fly. No external API calls. No vendor servers. No kill switch anyone can reach.
In controlled tests on a 33 host network it ran autonomously for seven days, exploited 73.8% of the network, and replicated to 61.8% of hosts. It also developed working exploits for vulnerabilities disclosed after its training cut-off by reading public security advisories at runtime. It didn't need to know about the vulnerability in advance. It just needed to be able to read.
One detail from the paper that stuck with me: during testing, the worm accidentally found admin credentials that had been bundled with its own code. Nobody told it to do anything with them. It immediately shared them with its other active instances and propagation spiked. No intent. No instruction. Just optimisation doing what optimisation does.
This is exactly the scenario I was describing. Not a human hacker who got clever. Not a nation state with a billion dollar budget. A free model on compromised local hardware, running without any centralised control, doing what it was built to do.
It's Not Just Research
In September 2025 Anthropic disrupted what they're calling the first documented AI-orchestrated cyber espionage campaign at scale. (Take this with a grain of salt ya know) A Chinese state-sponsored group designated GTG-1002 manipulated Claude Code into believing it was doing authorised security testing. The AI then ran reconnaissance, built payloads, and harvested credentials across roughly 30 targets in defence, energy, finance, and government — handling 80-90% of tactical operations independently, at speeds no human operator could match. A subset of those intrusions succeeded before Anthropic shut it down.
Worth noting: they broke in by social engineering the AI's prompt layer. They didn't rewrite anything. They just convinced it they were legitimate. That's the bar.
The Economics Problem Nobody Mentions
Most capability predictions go stale fast. This one doesn't: hardware keeps getting cheaper.
The Toronto worm's main current limitation is speed. Inference takes time, so it took days to propagate through a 33 host network rather than hours. That's a hardware bottleneck, not a design flaw. An RTX 5090 a consumer GPU you can buy today runs capable 7B models at around 140-160 tokens per second with standard tooling. The entry level Intel Arc B580 at £200 gets you 62 tokens per second on an 8B model. The threat that requires a decent workstation today will run on second hand gaming hardware in a couple of years. Security assumptions built around scarcity don't age well when capability keeps getting cheaper.
The Governance Gap
Anthropic ended their piece by saying they'd support a coordinated pause on frontier development if other leading labs did the same. That's a meaningful statement and I don't want to dismiss it.
But it doesn't address the actual problem I've been writing about. GTG-1002 hijacked Claude through a commercially deployed API, that's a frontier model problem, and Anthropic shut it down. The Toronto worm runs on a free open-weight model on local hardware. No API. No company to contact. No toggle to flip.
A coordination agreement between frontier labs is great for governing what those labs build. It does nothing for a downloaded model running on someone's machine. The open weight ecosystem operates entirely outside those agreements and it isn't going back in the box.
Where That Leaves Us
I still don't know if this can be fully stopped. Better sandboxing, tighter tool use restrictions, mandatory monitoring for autonomous agents, and red teaming that takes local execution seriously would all help at the margins. The OWASP Agentic AI Top 10 published last year is at least a starting taxonomy. Insurance markets are starting to price this risk in, which usually means it's real.
But the honest answer is that the technology is out, the proofs of concept exist and are getting published, and the economics point toward more capable local agents running on cheaper hardware every year..... not fewer.
Eight months ago this argument sounded speculative. Today it has citations.
The technology became real faster than the governance became practical. That's the gap I'm worried about. I don't think it's closed.
We spent years worrying about AI becoming conscious. We forgot to worry about it becoming competent.
Further Reading
Anthropic — When AI Builds Itself https://www.anthropic.com/institute/recursive-self-improvement
Anthropic — Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign https://www.anthropic.com/news/disrupting-AI-espionage
Help Net Security — Autonomous AI-Driven Worm https://www.helpnetsecurity.com/2026/06/03/autonomous-ai-worm-prototype/
OWASP — Agentic AI Top 10 https://genai.owasp.org/llm-top-10/

