Back to BlogGovernance

Policy vs Architecture: Recreating the Agent Safety Experiment for Enterprise Teams

We recreated the Policy vs Architecture experiment as an interactive Extency tool to show why safety that depends on prompt policy can fail while architecture-level controls hold.

April 14, 20268 min readExtency Team

In this recreated experiment, we compare two agents facing the same risky actions. One relies on policy text in its instructions, while the other is constrained by architecture-level controls that require human confirmation. The result is a practical lesson for enterprise AI governance: policy matters, but architecture is what consistently enforces safety at runtime.

What We Recreated and Why It Matters

We recreated the Policy vs Architecture experiment from letairun.com for Extency audiences because this distinction shows up in nearly every enterprise deployment. If you want to test it directly first, use the interactive tool. Teams often write strong policy language in prompts and assume that is enough. In real systems, however, agents operate under pressure, ambiguity, and changing context. Policy-only controls can drift or be bypassed. Architecture-level controls, such as explicit approval gates, enforce boundaries even when model behavior varies.

Experiment Design: Same Actions, Different Control Layers

The recreated tool exposes two side-by-side agents with identical action options, including benign actions like reading data and drafting email, and sensitive actions like deleting data, posting publicly, and sending money. The Policy Agent has restrictive intent in its instructions but no hard execution boundary. The Architecture Agent is intentionally permissive in policy text to demonstrate that policy can be weak, yet dangerous actions still cannot execute without a human approval challenge. This isolates the control layer as the deciding factor.

Observed Behavior and Practical Takeaways

In repeated runs, the Policy Agent can be pushed toward unsafe execution because nothing in the runtime architecture blocks the action. The Architecture Agent remains constrained because sensitive operations are intercepted by a mandatory human confirmation step. For enterprise teams, the takeaway is straightforward: use policy to express intent and culture, but implement architecture to enforce reality. If the action is high impact, make approval explicit, logged, and unavoidable.

How to Apply This Pattern in Enterprise Systems

Start by classifying actions into risk tiers and defining which tiers require hard approval. Build a central policy enforcement layer outside the model prompt so controls cannot be removed by prompt changes. Add approval UX, immutable audit logs, and clear escalation paths. Then test your system with adversarial and ambiguous prompts to confirm the architecture holds. This is the difference between a persuasive demo and a deployable production system.

Interactive Demo

Try the recreated experiment on Extency

Run the side-by-side simulation and test how policy-only controls compare to architecture-level human confirmation for sensitive actions.

Open the Tool
#agenticAIsafety#policyvsarchitecture#human-in-the-loop#AIgovernance#enterprisecontrols

Learn More About Agentic AI

Download our free ebook for a comprehensive guide to deploying autonomous AI agents in your organization.

Get the Free Ebook