Architectural Governance to Prevent Authority Drift in AI Systems

Project summary

This project develops and validates E5, a constraint-first governance architecture for large language models designed to prevent authority drift, hallucinated policy, and institutional overreach. Rather than moderating outputs after the fact, E5 enforces behavioral limits at the architectural level, ensuring models narrow, pause, or refuse when context, policy, or authority is missing.

The work focuses on adversarial and long-horizon validation of governance behavior under realistic conditions, with the goal of documenting reproducible principles for building non-authoritative, fail-closed AI systems.

What are this project's goals? How will you achieve them?

The primary goal of this project is to develop and rigorously validate an architectural approach to AI governance that structurally prevents large language models from acting with unwarranted authority.

This will be achieved by:

- Stress-testing E5 under extended, realistic workflows where authority drift typically emerges

- Conducting adversarial testing to surface edge-case governance failures

- Refining constraint patterns while preserving strict fail-closed guarantees

- Documenting governance principles, limits, and failure modes in a form suitable for external review

Success is defined by stable refusal, narrowing, or deferral behavior under ambiguity and pressure, not by task performance or output quality.

How will this funding be used?

Funding will primarily support dedicated research time to conduct validation, stress testing, and documentation that is not feasible alongside hourly labor alone. This includes long-horizon testing sessions, adversarial prompt design, systematic logging of failure modes, and preparation of clear written documentation.

The goal is to convert an already operational system into a well-documented, externally legible piece of applied AI governance research without weakening its constraints or safety properties.

Who is on your team? What's your track record on similar projects?

This project is currently conducted by a single researcher.

I have independently designed, rebuilt, and validated the E5 governance architecture through multiple empirical iterations in response to observed failures in real-world use of large language models. While I do not have prior grant-funded projects in this exact category, the current system reflects substantial applied work rather than speculative design, with demonstrated stability under conditions where baseline models routinely fail.

The scope of this project is intentionally matched to a solo researcher to ensure accountability, focus, and architectural coherence.

What are the most likely causes and outcomes if this project fails?

The most likely cause of failure would be discovering that certain governance failures cannot be prevented through architectural constraints alone without unacceptable loss of usefulness. Another risk is that edge-case behaviors emerge under long-horizon testing that require substantial redesign.

If the project fails, the outcome would still include clear documentation of the limits of constraint-first governance approaches and identification of failure modes that future safety research should explicitly address. In that case, the work would still provide negative results and design lessons valuable to the AI governance community.

How much money have you raised in the last 12 months, and from where?

None.