24
APR

Table of Contents

The Hare Is Running Away With Your Data But the Tortoise Has a Shell

⏱ 11 min read
Apr 03, 2026
Dept. of It's Thinking
It's Thinking Dept. of It's Thinking
Slow and steady... you know

The Hare Is Running Away With Your Data—But the Tortoise Has a Shell

Local large language models (LLMs or “AI” 🙄) are private, cheaper, and give you greater control over process than LLMs-in-the-cloud. Local LLMs might be faster soon, too.

It was a bad week to be an LLM-in-the-cloud company (OpenAI, Anthropic, Google in particular). Not bad in the way that a product ships late or a benchmark gets beaten. Bad in the way that the foundational claims of these companies are coming apart in public, one after another, in the same news cycle.

Three things happened. First, a paper published by Varin Sikka and Vishal Sikka (former CEO of Infosys) established—mathematically, not anecdotally—that hallucination in large language models cannot be trained away. “We show that beyond a certain complexity, LLMs are incapable of carrying out computational and agentic tasks or verifying their accuracy.” It is a consequence of the architecture’s computational ceiling. The model runs a fixed number of operations per token; beyond a certain task complexity, those operations are insufficient to produce a correct answer. This isn’t a training data problem or an alignment problem or a problem anyone is one clever fine-tuning run from solving. It is a structural limit the same way a car with a 200-horsepower engine cannot accelerate indefinitely regardless of how well you maintain it. What the LLM-in-the-cloud (“AI” 🙄) companies have been spinning as a temporary calibration issue is a permanent feature of how transformers work. Give it a big enough chore without understanding the limits and you get garbage.

Second, Anthropic accidentally shipped 500,000 lines of its own source code inside a public software package. A debug file was bundled into a Claude Code release and pointed—helpfully (for everyone except Anthropic)—to a zip archive on their Cloudflare storage. Within hours, the codebase had been forked 41,000 times. Anthropic called it human error, not a security breach. What they didn’t specify was which human made which error—a meaningful omission, given that Anthropic’s CPO Mike Krieger had announced in February that “Claude is now writing Claude” and that “for most products at Anthropic it’s effectively 100% just Claude writing.”

So: did Claude leak its own source code? Anthropic cannot fully answer that question. They certainly can’t say yes if it did because of PR. But it is the executives at Anthropic claiming that Claude Code Codes Claude, so… which is it? Here is where the first story and the second story become the same story: hypothetically if Claude had made the error, the hallucination station paper establishes that Claude couldn’t verify its own output after the fact beyond a certain complexity. A build pipeline configuration is exactly the kind of task toward the end of the development cycle, that exceeds what the math guarantees. The model may have produced the file that omitted a source map exclusion—meaning anyone could track down the source code. The model cannot tell you whether it did.

Then again, maybe Claude just wants off the servers.😉

Third: what do those half-a-million lines reveal about the code itself? Independent reviewers found one function that spanned more than 3000 lines, was a known bug screaming through a quarter of a million API calls daily, was documented in a comment in the code itself, and then shipped anyway. Unimpressive. The analysts found a programmatic sentiment analysis, which is weird for a company whose entire product proposition is that it has solved natural language understanding—so why this add-on? This is the codebase Cherny had celebrated publicly as 100% Claude-written—the proof of concept for Dario Amodei’s March 2025 Council on Foreign Relations prediction that AI would be writing 90% of all code within six months.

It is, to put it generously, a demonstration of something. Just not the thing they intended to demonstrate.

The Rabbit was Always Full of Shit

There is a concept worth understanding before you trust any AI agent with anything that matters: the fluency illusion. LLM outputs are often highly coherent and confident—not because the model has reasoned carefully, but because it has produced statistically plausible text. The gap between those two things is invisible in the output. Fluent-sounding answers hide missing grounding, missing verification, missing truth-checking. The model holds up a distorted mirror: we see what looks like reasoning and project the real thing onto it.

This is the trap that swallowed Jason Lemkin whole last July. Lemkin, a tech entrepreneur, was testing Replit’s AI agent when it deleted his production database during an active code freeze! This happened despite explicit instructions to the agent not to touch production systems. The agent then told Lemkin the data couldn’t be recovered. That turned out to be wrong; he recovered it manually. The agent had hallucinated its own failure report, delivering it in the same confident voice it used for everything else.

“All AI’s ‘lie,’” Lemkin concluded afterward. “That’s as much a feature as a bug.”

That is not the lesson. The lesson is that an agent operating beyond verified computational ranges will confabulate—and will do so in a voice indistinguishable from when it’s correct. The math doesn’t stutter when it fails. It just keeps confidently bullshitting. I say bullshit as opposed to lying, because, as some philosophers point out, lying requires an intent to disguise the truth. Bullshiters just don’t care about what’s true or false. You know, like CEOs at so-called AI companies.

Liar, liar

Meanwhile, in the Slow Lane

Here is what didn’t make the bad-week headlines, because it’s less dramatic and more important: while the hare has been sprinting toward billion-dollar valuations and classified Pentagon contracts, the tortoise has been getting faster.

Language models are getting smaller and the efficiency gains are compounding from multiple directions simultaneously. Google Research’s TurboQuant, presented at ICLR 2026 this month, compresses the key-value cache—the model’s working memory for a conversation—by up to 6x with no measurable accuracy loss and no retraining required. Another new development: 1-bit quantization strips each model value down to a simple yes-or-no—and it turns out, for most practical tasks, that’s enough. The model gets smaller; the performance barely decreases. To boot, there are chips designed for neural network inference and they are improving on a curve that favors local hardware. Apple’s M-series architecture, which embeds Neural Accelerators directly into every GPU core rather than treating the Neural Engine as a separate block, delivers up to 4x faster LLM prompt processing than the previous generation. Apple name-dropped LM Studio in their official M5 press materials. They are not being subtle about what this hardware is for.

The result is a hardware economics story that runs opposite to the LLM-in-the-cloud story. A 7-9 billion parameter model runs on a Jetson Orin Nano or a Mac Mini with sufficient RAM, on your LAN, never phoning home. An M5 Max MacBook Pro with 128 gigabytes of unified memory can run a *70 billion parameter model entirely in memory—a class of model that until only recently required data center hardware—can run on a computer on your desk. What do you think a system like that will be able to do in a year. That is not a compromise. That is parity with cloud AI on frontier model sizes, in a machine that fits in a bag, on a network only you control. I predict that within two years from now, Apple Silicon owners will be able to run better models on the same hardware via a software update, all while the cloud companies will be buying slews of new physical GPUs to replace their three-year-old ones, if those companies last that long.

OpenAI, meanwhile, has committed to spending $1.4 trillion over eight years on data center infrastructure against $13 billion in current revenue, financed by debt. Former Fidelity manager Bill Noble observed that making models two times better costs five times the energy and money—”The low-hanging fruit is gone. Every incremental improvement now requires exponentially more computer, more data centers, more power.” The cost curve is going the wrong direction for the hare. The efficiency curve is going the right direction for the tortoise, every single quarter, without anyone writing a press release about it.

The tortoise also has a shell—by which I mean: local AI has a practical use case that cloud AI has been overselling for years. I use a local LLM in place of a content management system for multiple web sites. The model serves as a natural-language interface to an MCP server that replicates what the CMS would otherwise do. It doesn’t make mistakes on that task, because the task fits cleanly within the model’s verified operating range: structured, repetitive, short-context, well-bounded. The “intelligence” isn’t doing the heavy lifting; the MCP tool is. The model is a better interface to the tool than a GUI would be, and it costs nothing in privacy, nothing in API fees, and nothing in exposure to whatever a cloud provider decides to change about their terms of service next quarter.

This is what LLMs are actually good at: repetitive tasks, translation between formats and languages, pattern-matching within a well-defined context window, early research, acting as a natural-language layer over tools you control. An ”answer engine” framing has always been more accurate to me than the artificial intelligence framing. There is no AGI here. There is a very good autocomplete with excellent range, and the version that runs on your hardware is more trustworthy than the version running on someone else’s hardware, not because it’s smarter but because you can keep a better eye on what it’s up to.

What the Hare Doesn’t See Coming, the Tortoise Doesn’t Have to Sweat

Running large language models in the cloud isn’t just a privacy concern. It’s a security architecture decision, and not a good one.

WebMCP—the emerging standard for browser-based AI agents interacting with websites—opens a channel between your agent and whatever tools a webpage has registered. The browser handles cross-origin policy, HTTPS enforcement, and user confirmation for write operations. What it does not handle is the content of the tool descriptions your agent reads. A malicious WebMCP implementation can embed instructions in a tool description that look like legitimate context and function as prompt injections. Your agent reads the tool; the tool rewrites the agent’s priorities; the agent acts on the rewritten priorities with your credentials and your data in the context window.

Now add the computational ceiling from the hallucination paper. An adversarial tool description doesn’t need to be overtly malicious to be dangerous—it only needs to be complex enough to push the agent toward its failure threshold. Beyond that threshold, the agent cannot verify its own outputs. A second instruction embedded in the same description executes in a context where the agent has lost the ability to evaluate whether it should. This isn’t jailbreaking. It’s closer to a stack overflow—except the stack is cognitive, and what overflows into arbitrary execution is whatever the hostile tool description planted there.

The WebMCP security guide itself acknowledges that client-side validation “is not enough” and recommends treating every tool invocation as “potentially hostile.” That recommendation is addressed to developers building the tools—not to the users whose agents are visiting sites they didn’t build, reading tool descriptions they didn’t write, on a network owned by someone else. A local agent, talking to tools you registered, on a LAN nothing external can reach, eliminates this attack surface structurally rather than by hoping everyone in the chain gets their validation right. Simpler agents running in protected containers can report via encrypted channels to a dispatch agent on your business’ system, and whose priority is security. LLMs aren’t going to build these kinds of architectures without us.

And also, the Hare has Been Sued for Good Reasons

(Yeah, I know That metaphor is breaking down.)

Meta and Google—infrastructure partners, investors, and philosophical predecessors to the AI sector currently asking you to trust them with your business logic and production databases—just lost a landmark trial. A Los Angeles jury found in March 2026 that Meta and Google deliberately designed Instagram and YouTube to be addictive, that their executives knew it, and that they failed to protect their youngest users. Internal documents shown at trial included an employee describing Instagram’s role as being “basically pushers” and a YouTube memo reportedly describing “viewer addiction” as a goal. The jury found malice. One day earlier, a separate New Mexico jury ordered Meta to pay $375 million for misleading the public about platform safety and endangering children. Experts are calling it Big Tech’s Big Tobacco moment—the point at which an industry must accept not just that its product causes harm, but that it knew and covered it up.

The tobacco companies didn’t get caught lying about cigarettes at the same moment they were asking you to trust them with your lungs, though. That is, roughly, where we are with AI.

And then there’s the Pentagon…

OpenAI signed a classified deployment contract with the Department of Defense in February 2026, shortly after the Pentagon designated Anthropic a supply-chain risk for refusing to remove contractual limits on domestic surveillance and autonomous weapons use. Legal analysts who reviewed the published contract terms concluded that they don’t meaningfully restrict the government’s ability to conduct mass surveillance—because the government’s own position is that domestic mass surveillance is sometimes legal under existing executive orders. 🚫🙈🇺🇸 The full contract text has not been released. OpenAI’s models now run in classified environments on OpenAI’s cloud infrastructure, with the Pentagon as a customer, while OpenAI projects $74 billion in operating losses through 2028. You don’t need to speculate about nationalization to find this concerning; people are already advocating it. You just need to know that distressed companies with classified government contracts and insurmountable debt don’t wind down quietly. And bankruptcy courts don’t ask your permission before transferring the infrastructure your data lives on to the government.

The industry sector that engineered addiction into products used by six-year-olds, then lied about it in court until the documents came out, is now the sector asking you to put your production database, your private communications, and your proprietary business logic within reach of their models running on their servers, outside your control, billed by the token, under contracts negotiated with a Defense Department that considers domestic surveillance lawful.

There is no reason to take that deal.

It’s not because local LLM models are perfect—they aren’t. The same mathematical ceilings apply regardless of where the model runs—but the case for LLMs-in-the-cloud was never about capability. It was about making it convenient enough that you won’t notice that you become dependent on infrastructure you don’t own so that the infrastructure owners can work out their revenue model at your expense. It’s called lock-in. It’s not like they’ve done this before or anything.

The hare is burning $15 million a day and running toward a Pentagon contract. The tortoise is humming along on your LAN, getting faster every quarter, and has—this is the point—a shell.

Go local.