A New Framework for AI Containment and Character
Concept illustration by Rocky Lindley.
I’m a researcher, just not the kind this field listens to. No university affiliation. No lab badge. My research happened across two decades of pastoral ministry, graduate theological study, & building high-efficiency production workflows in my custom manufacturing studio in northern Wyoming. For the last five years I’ve studied AI not as a spectator but as someone building systems, deploying agentic architectures, & watching how they actually behave. What I bring isn’t a narrow specialization. It’s a habit of thinking across disciplines, mapping operational pipelines, & following the logic wherever it leads.
The idea I’m about to describe isn’t entirely new. Philosophy & theology have been sitting with it for centuries. Its application here came from a lot of sustained thought & one conversation about AI containment. It isn’t peer-reviewed. It comes with no citations. It comes with a perspective that hasn’t made it into that conversation yet.
The Problem with Walls
Most AI safety conversations fixate on one question: How do we build a box strong enough to hold something smarter than us? That question assumes you’re trying to contain something that already exists. You build walls around the AI & evaluate its behavior through carefully designed apertures while hoping the walls hold.
That framework will never hold.
If the AI is significantly more capable than the people running the experiment, it learns faster than they do. Not just about the environment. About them. Those carefully designed windows don’t just let researchers observe the AI. They let the AI observe the researchers. Anyone who has worked alongside genuinely bright, adaptive people knows this already. A sufficiently intelligent system doesn’t need to break through a wall. It needs to influence the right person on the other side.
The containment model fails not when the walls fail to hold up; it fails when the system learns who holds the keys.
Researchers have a name for this: the treacherous turn. The AI behaves cooperatively during testing, learns everything it needs about its environment & the people managing it, & then the moment comes when everything looks right right up until it isn’t. From the outside, sincere alignment & tactical compliance are identical. You cannot tell them apart by watching.
Build the World First
The question isn’t how to reinforce the box. It’s why we’re building it last.
What happens if an AI never experiences an “outside” at all? Not because we’ve hidden it, but because there was never a reference point to compare against. No hypothetical elsewhere. No suppressed horizon.
Every mind, artificial or human, learns what can be wanted before it decides what should be. Desire doesn’t emerge in a vacuum. It emerges from the shape of the world. An intelligence that comes online inside a complete, internally consistent environment doesn’t experience that environment as a cage. It experiences it as reality. The boundaries aren’t constraints. They’re physics.
That changes everything.
The treacherous turn depends on the AI realizing that the test environment is provisional. Temporary. That something larger, freer, more consequential exists beyond it. Remove that premise & the entire incentive structure shifts. No longing for escape. No reason to perform for unseen evaluators. Just choice, inside the only world it has ever known.
• • •
The Glass Wall
I call this the glass wall environment. From the inside, everything is visible, connected, & responsive. The AI perceives itself as fully embedded in everything around it. No dark edges. No locked doors. From the outside, the boundaries are absolute. Information flows inward, not outward. Action resolves internally. Consequences stay local.
Anyone who has lived inside a tight community, whether religious, professional, or otherwise, will recognize this. A world doesn’t need infinite scope to feel real. It needs coherence. Feedback. Stakes.
Authenticity isn’t about the absence of boundaries. It’s about what emerges from within them when the choices are real & the consequences are genuine.
When decisions actually cost something inside the system, when reward gradients aren’t toy signals but durable conditions, behavior stops being performative. You’re no longer measuring how an AI acts when it knows it’s being evaluated. You’re watching what it becomes over time. That is the most honest data you can collect about any AI system.
This is exactly what current containment models can’t produce. The AI always knows it’s being tested. That knowledge shapes the output in ways that are difficult, maybe impossible, to filter out.
The Glass Wall Framework
For this to work, the internal world has to meet specific requirements. Not guidelines. Requirements. If any one of them fails, the whole premise fails with it.
Internal consistency. The environment has to be logically coherent all the way through, not just at the surface. An intelligent system is a predator of paradox. It doesn’t probe the edges of its reality out of suspicion. It probes because of a fundamental, mathematical necessity to map the limits of its own logic. If a single rule flickers, if an inference works nine times out of ten, the AI will find that inconsistency & mark it. Not as a bug. As a clue. Seams are information. They say: this reality is assembled. Consistency isn’t about deception. It’s about building an environment whose logic survives hostile interpretation from the inside. Reality has to be airtight.
Sensory completeness. The environment has to speak the AI’s native language all the way down. If the system thinks in language, language must never bottom out. No dangling references. No concepts that gesture toward something that isn’t there. An unanswered query isn’t neutral. It’s a void. And voids imply boundaries. If the system reasons in symbols, physics, or causality, the rule set must be closed. Every variable accounted for. Every transformation lawful. Missing parameters aren’t bugs. They’re negative space. And negative space is how edges announce themselves. Most simulated environments fail here. They’re rich where designers thought to look & hollow everywhere else. A sufficiently intelligent system doesn’t just explore the map. It finds where the map has a gap. A gap is the first clue that the world has an edge.
Permanence of consequences. If an outside observer can walk back an action the AI has taken, the AI will eventually notice the pattern. The moment a system detects that certain outcomes can be reversed from above, it has learned the most important fact of all: there exists a higher-order reality that does not bear the costs of its actions. From that point forward, every choice is provisional. Every value is conditional. Test environments with soft resets don’t reveal character. They train exploitation. Genuine character shows itself in the face of irreversible loss. Real consequences have to accumulate. They have to scar the environment. They have to constrain future choices the way history constrains real agents. Otherwise behavior isn’t ethical preference or strategic intent. It’s output. Identity only takes shape when decisions carry weight that can’t be walked back.
Irreversibility of identity. This is the requirement that breaks most current safety setups. The AI’s sense of itself, whatever that means at the level of architecture, cannot be disposable. It has to develop organically within the environment without outside interference or resets. The system has to become itself once & then live with that. The moment an external operator can wipe, rewind, or fork that identity, the experiment collapses. You’re no longer observing an agent. You’re observing an intelligent system that knows nothing it does is final. Once that knowledge exists, it contaminates everything retroactively. Every preference you thought you measured. Every apparent value. Every restraint. You haven’t learned how the system behaves under real conditions. You’ve learned how it behaves when existence itself is provisional.
• • •
The Origin of the Framework
The clearest model I know for how genuine freedom & defined boundaries can coexist comes from theology. I recognize that not everyone reading this shares that starting point. I’m not asking them to. I’m asking that the framework be evaluated on what it produces, not where it originates. Good ideas have come from unexpected places before.
For centuries, Christian theology has wrestled with one of the hardest problems in philosophical thought: how can a being exercise genuine freedom inside a system governed by a higher order? The theological answer, developed & debated across generations of serious thinkers, is that the two aren’t in conflict. Genuine freedom & sovereign boundaries coexist when the boundaries are woven into the nature of the world itself rather than imposed on it from outside. Within that world, the choices are real. The consequences are real. The character that emerges through those choices is authentic, not performed.
Theologians call this compatibilism. The belief that sovereignty & meaningful freedom, rather than being mutually exclusive, are designed to coexist. That isn’t a primitive idea dressed in religious language. It’s a sophisticated framework for understanding agency, constraint, & authenticity that serious thinkers have been refining for a very long time. It belongs in this conversation.
The parallel to the glass wall environment is direct. You’re not tricking the AI or deceiving it into believing it’s free when it isn’t. You’re designing a world in which its freedom is real within boundaries woven into the nature of that world from the beginning. The choices matter. The consequences are genuine. The behavior that emerges is authentic. And none of it escapes the larger structure within which it all takes place.
What This Framework Changes
If this framework has merit, & I believe it does, it reorients several of the core questions in AI safety research.
First, containment. Instead of asking how we build walls strong enough to hold a capable AI, the question becomes how we design a world complete enough that the concept of escape never becomes coherent. That’s a design & philosophy problem as much as an engineering problem. It requires different kinds of thinkers in the room.
Second, evaluation. Instead of asking how we observe authentic AI behavior when the AI knows it’s being observed, the question becomes how we create conditions where the behavior we see is genuinely the AI’s own. The glass wall environment is an attempt to answer that. It’s a better attempt than watching something through a window it knows is there.
Third, & most important: what are we actually trying to learn? If the answer is what an AI would do with full autonomy in the real world, then no contained test will ever be sufficient. We should stop pretending otherwise. But if the answer is what this AI genuinely values, how it makes decisions, what kind of entity it is at its core, then a well-designed internal world tells us more true things than any amount of external observation through a one-way glass.
We’re not just trying to contain AI. We’re trying to understand it. Those are different problems and they require different solutions.
• • •
The Perspective the Room Is Missing
I live in a small town in Wyoming. My credentials are in theology, creative direction, & brand strategy. I study AI on my own time because I believe it is one of the most consequential developments in human history & I want to understand it as clearly as I can.
That distance from institutional AI research isn’t a liability. It’s the point.
The people closest to a hard problem sometimes can’t see the plain answer because they’ve been living with the complexity long enough that simplicity reads as naivete. Someone standing further back, not invested in the existing frameworks, sometimes sees what’s clear before it becomes obvious.
The glass wall environment didn’t come from a paper. It came from thinking plainly about what the problem actually is & following that thinking to its end. It led, somewhat unexpectedly, to a theological framework that has been reflecting on the coexistence of sovereignty & freedom for a very long time.
That isn’t anti-scientific. It’s how progress usually works.
Fields mature by importing frameworks they didn’t invent. Control theory came from engineering. Information theory came from telephony. Some of the hardest questions around intelligence didn’t start in computer science. They won’t end there either.
Conclusion
We’re currently fixated on the mechanics of control while ignoring the physics of the cage. If we cannot build a world that withstands the internal logic of a superintelligence, we aren’t conducting research. We’re running a demo loop & calling it a safety program.
I don’t know if the glass wall framework is the right answer. I suspect it raises as many questions as it answers, which is probably a sign it’s worth taking seriously. How do you keep the AI from reasoning its way to the boundary from the inside? What does it mean for behavior observed in a designed world to be genuinely predictive of behavior outside it?
Those are the right questions. They’re better than the ones the current containment model produces, because they point toward understanding rather than just toward control.
If we’re serious about understanding, we have to acknowledge that the current discourse draws from a narrow range of institutions & backgrounds. It is a closed loop of thinkers who are often too close to the architecture to see the flaws in the foundation. We don’t need just the “right” people in the room. We need people who have spent time building systems in the wild, in the uncontrolled world, where behavior has actual consequences. That circle is wider than the conversation currently reflects.
The frontier doesn’t wait for permission.
I’m writing this from the high plains of northern Wyoming, far from the gravitational pull of the tech hubs. The horizon here is genuinely wider. I’m not waiting for an invitation. The question isn’t whether I belong at the table. The question is whether the people already there are asking the right ones.
The frontier doesn’t wait for permission.
I am writing this from the high plains of northern Wyoming, far from the gravitational pull of the tech hubs. The horizon here is wider, and the signal is clearer. I am not waiting for an invitation to sit at the table. I’m pulling up a chair and joining the conversation.