Trustworthiness, Viability, and the AI Off Switch
Round two of the Anthropic-Washington standoff
Earlier this year, in a piece in MedCity News, I wrote about the balance between trustworthiness and commercial viability that every AI company has to manage. I put the core of it plainly.
Anthropic’s stated competitive posture can never be purity. It was, and should remain, credibility.
That balance is again being tested in public, with Washington forcing Fable 5 and Mythos 5 offline over a narrow jailbreak, and Anthropic contesting the recall. The difference this time is that the pressure for restraint came from government rather than from inside the company.
It is the kind of upheaval you would expect in a risk-laden, rapidly evolving, and uncertain industry.
What the episode exposes is less which side was right than how little settled process there is to decide such things. A capable model can be pulled by one party and defended by another, each invoking some version of the public interest, with no shared standard for what counts as a serious enough risk, or who carries the burden of proof.
Stakeholders expect win-win relationships. They want competence and benevolence, not altruism and self-sacrifice.
Trust, in the sense I have spent my career studying, does not come from any single decision turning out well. It comes from rules that are legible in advance, applied consistently, and open to challenge, and from actions that unmistakably demonstrate benevolence toward the stakeholders involved.
We do not yet have those rules for deploying or recalling frontier models. Who gets to switch a model on, who gets to switch it off once it is deployed, and how that authority settles into a trustworthy steady state, that is what deserves focus now.
The original MedCity piece is here.


