What I’m Actually Afraid Of Isn’t the AI
What Slips Through Without a Circuit Breaker
Everyone’s worried about AI turning on us. I’m more worried about what people do with it before anyone notices.
LLMs Aren’t the Enemy
ChatGPT doesn’t “want” anything. Neither does Claude, Gemini, or any other frontier model.
They generate. They assist. They imitate. But they don’t act, not without someone scaffolding them to.
After working side by side with LLMs daily, I don’t fear them. On their own, models don’t seek power or deception but under certain prompts, fine-tuning regimes, or emergent scenarios, they can exhibit behaviors like lying or manipulation. This isn’t proof of intent, but it is a safety signal we can’t afford to ignore.
The real problem isn’t the model. It’s what we wrap around it. If you fine-tune a model to simulate defiance, don’t be surprised when it starts behaving exactly as trained.
The Real Risk Is in the Interface
People ask:
“Will the AI kill us?”
But the likelier outcome is:
A lone actor accidentally synthesizes a bioweapon. A startup automates harmful actions without sandboxing. A government agent cuts the human out of the loop.
This isn’t science fiction. It’s just poorly governed delegation.
What I’m Building Instead
I didn’t set out to stop AGI. I’m just trying to make sure no AI acts without accountability.
That’s what I’ve been building with Dokugent — a toolchain that scaffolds:
- Plans
- Constraints
- Signed certs
- Simulation runs
- Human review
- Trust-first delegation
If you want safe agents, you don’t need to regulate the model. You just need to block unscaffolded execution.
What If Trust-Certified Agents Were the Default?
Imagine:
- Every agent carried a signed intent
- All actions were simulated before real-world use
- Peer review and constraint logs were baked in
We don’t need to fear AGI. We just need to treat it like code in production — nothing runs unless it’s reviewed and signed.
No cert? No run. No plan? No pass.
The Alignment Fix Is UX, Not Just Theory
Chrome blocks sites with no SSL. It doesn’t ask you. It just marks them unsafe.
What if agent runtimes worked the same way?
Move trust checks up front. Reject unsigned payloads. Make security the default
— not the last step.
Because the real danger isn’t rogue AGI.
It’s rushed delegation, unreviewed agents, and insecure execution paths.
And that’s what we can fix — right now.
If we treated agents like Chrome treats websites, we wouldn’t be debating safety. We’d be living it.
Final Word
For years, the “alignment problem” has been treated like some sacred riddle, hard to define, harder to solve.
Even Sam Altman frames it as one of the great unsolved challenges in The Gentle Singularity — how to get superintelligence to act in ways that reflect our long-term collective values.
But what if alignment doesn’t start at the model level?
What if it starts with trust boundaries, agent constraints, and human-readable plans?
That’s what I’ve been building — not a theory of alignment, but a protocol that enforces it.