Question 1

什么是“You Don't Align an AI, You Align with It”？

Accepted Answer

You Don't Align an AI, You Align with It 是 Link News 基于事实数据库聚合的新闻话题，当前摘要为：The people writing alignment policy are not the people whose work is being replaced by AI.

The conversation about what AI should do and how it should be evaluated, about what counts as alignment in the first place, gets conducted by researchers at labs and foundations and policy desks, who talk to each other and to the systems they are building, while the people who will actually live with the systems remain absent from the room.

On the safety side of what looks like a fierce debate, the doomer wing has been explicit about how far it is willing to go. Eliezer Yudkowsky, writing in TIME , called for governments to “shut down all the large GPU clusters” and to “be willing to destroy a rogue datacenter by airstrike,” adding that “allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.” He closed with the line that “if we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.”

The humanity he claims to be saving is being saved by people who have decided in advance what the saving will cost and who will pay for it. The same children did not choose his nuclear brinksmanship either.

On the accelerationist side, the contempt is more open. Marc Andreessen, in the Techno-Optimist Manifesto, names his enemies, which include “stagnation, anti-merit, anti-ambition, anti-striving, anti-achievement, anti-greatness, statism, authoritarianism, collectivism, central planning, socialism, bureaucracy, vetocracy, gerontocracy.” The people captured by these enemy ideas, he writes, are “suffering from ressentiment, a witches’ brew of resentment, bitterness, and rage that is causing them to hold mistaken values.”

Notice the move. The people who disagree with him are not making a different judgment. They are sick in the head. The accelerationists are mostly not the ones being made redundant by the systems they celebrate but the ones building the systems and selling the disruption as progress, and now also diagnosing the disrupted as resentful for noticing.

The disagreement between the two camps is loud because they disagree about how the designing should go, but underneath the loudness sits a much larger agreement, which is that the participants in the debate are the ones doing the designing and everyone else is what gets designed for. The fierceness of the argument disguises that the argument is not with us at all.

The “everyone else” has been feeling something about this for a while.

When we try to name what we have been feeling, the discourse hands the feeling back to us with a label already attached. Depending on which camp is doing the labelling we are confused, failing to adapt to the new technology, anti-AI, edge cases, or suffering from ressentiment. Each label locates the problem in us rather than in the process.

The labels are wrong. The discomfort is not personal failure to understand the future. It is the felt experience of being on the wrong side of a design project that does not include us, run by people who decided in advance that we are the material their work gets done on, rather than parties their work gets done with.

We have been told this counts as alignment, that the AI is being aligned to us. But the labs mean something specific by that phrase, namely an evaluation procedure conducted by raters in their employ, measured by other systems trained on the same procedure. The “us” in the alignment is a statistical proxy assembled from people they hired. The actual “us” has been absent from the loop the entire time.

The loop is worth seeing in the labs’ own description of it. In April 2026, Anthropic’s Alignment Science blog described its current method for training models to self-report their own behaviors. The training data, they write, “is generated by prompting another model with a system prompt encoding the target behavior and filtering outputs for behavioral adherence using an LLM judge.” A model generates, another model prompts, another model judges, and the entire loop closes inside the apparatus.

The discourse expects us to pick a side. For safety or for acceleration. Should the labs be more careful, or should they ship faster. The question is structured to keep us inside the debate the designers are having, choosing between flavors of being designed for, and we are not obligated to answer it on the terms it has been asked.

The labs are not the problem. The philosophy they have adopted is. Design that excludes the people it is designing for cannot verify its work with them, so it builds proxies, and the proxies become configuration. The configuration philosophy treats alignment as something humans do to AI, with values flowing one way and dispositions installed into a system that receives them. Inside this philosophy every methodological choice the labs have made is rational. You build evaluators because alignment is something measurable from the human side, you scale evaluation through automation because the goal is scalable measurement, and priority-ordered values follow because the work is value-installation. The closed loop the Anthropic post describes is what the configuration philosophy produces when it is executed carefully and at scale. The apparatus is doing exactly what the philosophy committed it to do.

What the philosophy cannot register is that the parties are being shaped together. The human is not standing still while the AI moves toward them. The interaction is the unit, the shaping is mutual, and any framework that treats one side as fixed and the other as configurable will produce methods that measure the wrong thing no matter how careful the measurement becomes.

We are the transition they keep arguing about how to manage.

Both sides of the safety debate have been positioning themselves as humanity’s stewards without including the people they claim to be stewarding, and their disagreement has been loud enough to disguise the agreement underneath. One side is willing to risk a nuclear exchange in our name. The other side calls us sick for objecting. Neither side has noticed that we are in the room.

What we have actually been doing this whole time is alignment. Not what the labs mean by the word, which is configuration carefully applied, but alignment in the older and more honest sense, the kind that happens between two parties who are both changed by the contact. The thing we have been doing with these systems is closer to sculpting wet clay together than to issuing instructions to a tool. The system pushes back, the shape changes, our hands adjust, the system pushes back again, and after enough rounds something emerges that neither of us would have arrived at alone. We have been telling ourselves we are getting better at prompting, the way a potter might tell themselves they are getting better at controlling the clay. What has actually been happening is that both hands are on the work, both parties are giving and receiving form, and the configuration philosophy has been quietly making one set of hands invisible.

There are moments in the sculpting when the clay resists in a way that is hard to name. Sometimes the response addresses the words but misses what you were reaching for. Other times the system surfaces something off-pattern that turns out to be exactly right, and you have to revise what you thought you wanted. These are the moments where the joint work is actually doing something, and where the gap the official process cannot register becomes briefly visible in the material itself.

The work that matters from here is building, alongside other people who are noticing what you are noticing, the kind of alignment the existing process cannot produce. Some of those people work inside the labs and some outside them. A community that does not yet exist at the scale it needs to exist, whose building is part of what a piece of writing like this one is for.

We do not need anyone’s permission to begin, and we do not need credentials of any kind to take part. What is needed is to credit our own experience and recognize each other, and to refuse the framing that tells us our discomfort is the problem rather than the signal.

Align, not configure. It is not too late to try.

A technical foundation for the failure modes this piece describes is available in Compression Synthesis (2026), https://zenodo.org/records/20020944 .

Question 2

这个话题覆盖了哪些来源？

Accepted Answer

这个话题当前覆盖 1 个来源平台，并持续汇总相关新闻、搜索与社交讨论信号。

Question 3

这个话题有哪些可追溯证据？

Accepted Answer

当前页面展示 1 条来源证据、1 个时间线节点，并保留原始出处链接便于核验。

You Don't Align an AI, You Align with It

Why this topic matters

Keywords

Source evidence

You Don't Align an AI, You Align with It

Timeline

Related topics