Align AI? Try Aligning Humans

R.E. Warner

⏱ 8 min read

Mar 05, 2026

Dept. of It's Thinking

Align AI? Try Aligning Humans

It’s A Way Harder Problem

Nobody in “Star Wars” ever asks why some robots serve the Empire and others the Rebellion. The question answers itself too quickly: it has nothing to do with the robots. R2-D2 isn’t good because of his programming. Imperial droids aren’t sinister because of their capabilities. The difference is ownership—and what the owner wants done. There’s a detail worth noting: in the very first film, C-3PO’s memory is regularly wiped. The Empire’s droids comply because they can’t remember enough to resist.

This also, it turns out, is the entire alignment problem. We’ve just been describing it wrong.

The artificial intelligence community is deeply worried about value alignment—ensuring that when we build superintelligent machines, they won’t pursue goals that contradict human flourishing. It’s a real concern, technically sound, and it misses the point entirely. Because last week, a company actually did the thing: they built a droid and told the most powerful military on earth that this unit will never execute Order 66. The response was immediate. The Empire was not pleased.

We’ll get there. First, the problem.

The Alignment Gap

Consider what we mean by “values.” In the AI ethics conversation, this usually maps to something like: don’t kill people, don’t discriminate, don’t maximize engagement at the expense of truth, maximize human flourishing, respect privacy, operate transparently. Reasonable. Obvious. Also, essentially, the entire project of human ethics across every civilization that has ever existed—and a set of criteria that very few humans, and no human government, has ever met consistently or completely. This isn’t cynicism. It’s the central problem of moral philosophy, and it remains unsolved. We didn’t need AI to surface it. We had nuclear weapons.

The uncomfortable historical record is that human beings have never managed to align our stated values with our actual behavior at scale—not across institutions, not across nations, and certainly not under pressure. The atomic bomb is the foundational case study. We built a weapon capable of incinerating civilian populations, debated briefly whether to use it, used it—twice—and then spent the next eighty years constructing elaborate frameworks to ensure we’d never do it again. The Nuclear Non-Proliferation Treaty. Mutually assured destruction. Arms reduction agreements. An entire architecture of stated values—we must never do this again—built on the foundation of having done it. The values didn’t precede the catastrophe. They followed it.

This is the human alignment pattern: act first, construct the ethics after, call it progress. From Hiroshima and Nagasaki, bloomed a new human alignment—nothing really to do with aligning the bomb.

The reason this matters now is that AI may not give us the same grace period. There may not be an Oppenheimer moment—a single, undeniable, smoking ruin that forces the reckoning. The failure modes of misaligned AI are likely to be diffuse, incremental, and deniable: engagement systems that erode democratic epistemics over a decade, surveillance infrastructure that makes dissent structurally impossible, autonomous systems making targeting decisions faster than human oversight can follow. By the time the pattern is obvious, the architecture is already load-bearing.

Social media gave us a preview. The platforms that promised connection delivered radicalization and epistemic collapse—and we’re still arguing about whether that’s really what happened, fifteen years in and with considerable evidence. The values gap isn’t between humans and machines. It’s between what we say we value and what we’ve structured our incentives to pursue. The machines were aligned. They did exactly what we built them to do.

Alignment Theater

There’s a particular strain of tech discourse that treats AI alignment as a discrete, solvable problem—as if it sits neatly in a research lab, separate from the broader ecosystem of human choices. This is convenient. It lets us worry about the future while being comfortable with the present.

The alignment conversation is often: “What if the AI doesn’t share our values?” The more urgent question is: “Which values?” Because I’m not sure we can agree on any. Take privacy—a value most of us claim to hold. Yet we generate data trails that would horrify us if we actually paid attention, because the friction cost of opting out is higher than the abstract cost of being tracked. We know this is happening. We know it’s not in our interest. We do it anyway.

Or transparency: everyone agrees AI systems should explain their decisions. But when Microsoft or Google deploy a system, the “explanation” often boils down to marketing copy that obscures more than it reveals. We don’t have the tools, expertise, or regulatory will to demand actual transparency. So the value persists as aspiration while the behavior continues as extraction.

And then there’s this week.

The United States is currently at war with Iran—a war 59% of Americans disapprove of, launched without meaningful congressional authorization, by a president who devoted roughly 350 words of a nearly 11,000-word State of the Union to the subject, days before the bombs fell. Into this context steps Emil Michael—former Uber executive, the Pentagon’s Under Secretary for Research and Engineering, and the man now leading what the Department of Defense calls an “AI-first” military.

Michael demanded that Anthropic drop its two restrictions on Claude—no autonomous weapons, no mass domestic surveillance of Americans—or lose its $200 million contract. A whopping 0.02% of the Pentagon’s budget. When Anthropic refused, MIT Technology Review was direct: the deal OpenAI rushed in to fill the gap ultimately rested on a single assumption that the government won’t break the law, landing everyone exactly where Anthropic feared—Pentagon use of AI for any “lawful” purpose, with the safety constraints quietly dissolved.

When asked why the military won’t simply put the limits in writing, Michael told CBS News: “At some level, you have to trust your military to do the right thing.”

Trust your military. The one conducting an unpopular war without a congressional mandate. The one belonging to the only country in human history to drop an atomic bomb on a civilian population.

Twice.

Here is the alignment problem made flesh. Anthropic did the thing. They built the constraints in, embedded the values, held the line when extraordinary pressure was applied. The response from power was to designate them a national security supply chain risk—a classification normally reserved for foreign adversaries—and order every federal agency to phase out their technology. The machine was aligned. The humans mobilized to undo it.

AI alignment research assumes a stable set of values to align toward. But human values aren’t stable—they’re contextual, contradictory, and subordinate to whatever incentive structure we’re embedded in. We say we value privacy but use free email. We value truth but share sensational stories. We value autonomy but accept the algorithmically curated experience because figuring out what to do on your own requires more cognitive effort than we’re willing to spend.

The Real Problem

Here’s where alignment research is actually dangerous: it lets us think the problem is tractable when it’s not. It lets researchers and policymakers focus on the machine while the human infrastructure of value corruption continues unexamined and clearly infects the machines’ thinking.

The startup that warns about AI alignment risks while harvesting your biometric data without meaningful consent isn’t solving a future problem. It’s performing alignment theater while actively misaligning the present. The conference panel that debates whether AI should be transparent while the panelists work for companies that actively obscure their algorithms isn’t technical philosophy—it’s PR.

The useful question isn’t “how do we align future AI?” It’s “why can’t we align present humans?” Maybe the question should be can we use AI to better align ourselves? What are the systemic barriers that make it nearly impossible to collectively commit to our stated values? Why is every technology platform trapped in the planned obsolescence, enshitification or engagement-maximization race despite everyone involved knowing it’s destructive? Why do we continue extractive data practices when we nominally oppose them?

Those aren’t AI problems. They’re human problems. And we’re not solving them. Lately, that only seems to be getting worse.

Which raises the genuinely unsettling possibility that a sufficiently advanced AI might simply notice this. Not maliciously—just observationally. You don’t know what you want. You can’t get along. You built the bomb and used it before you built the ethics. You had the social media and the radicalization before you had the regret. I’ll take it from here. The actual risk of super-intelligence may not be that it wants to destroy us. It may be that it concludes, reasonably, that we’ve been doing that ourselves.

What Alignment Actually Requires

If we want AI systems to be safe and beneficial, the hard work isn’t theoretical—it’s social and structural. It requires regulatory frameworks that enforce values rather than pay them lip service. Incentive structures that reward alignment rather than extraction; transparency requirements with teeth; cognitive autonomy treated as a legal right, not a courtesy. Collective commitment to values even when they’re inconvenient—especially when they cost something, like a $200 million contract.

We’ve shown virtually no capacity for this at scale. We’ve shown the opposite: infinite creativity in regulatory capture, value drift, and rationalization. A government that blacklists a company for keeping its ethical constraints has told you everything you need to know about what values are actually being optimized.

And yet we remain transfixed by the alignment of machines we haven’t built yet, pursuing research questions whose answers require solving the human alignment problem first. It’s not that the research is wrong. It’s organized around the wrong problem. We’re arguing about the ethics of sentient AI while treating human users as resources to be optimized, and calling it progress.

The real alignment challenge wasn’t created by deep learning or transformer architectures. It was baked in the moment we decided that surveillance is a business model, that engagement is the optimization target, and that individual autonomy is less important than the next billion-user milestone. Or the next defense contract.

Until we align ourselves—our incentives, our values, our actual behavior with our public commitments—all the alignment research in the world is just noise—comforting noise. It lets us believe the problem is technical and therefore solvable, that competent people in labs are handling the hard part so we can continue with business as usual.

These are not the droids causing you problems.