You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
This project develops and validates E5, a constraint-first governance architecture for large language models designed to prevent authority drift, hallucinated policy, and institutional overreach. Rather than moderating outputs after the fact, E5 enforces behavioral limits at the architectural level, ensuring models narrow, pause, or refuse when context, policy, or authority is missing.
The work focuses on adversarial and long-horizon validation of governance behavior under realistic conditions, with the goal of documenting reproducible principles for building non-authoritative, fail-closed AI systems.
The primary goal of this project is to develop and rigorously validate an architectural approach to AI governance that structurally prevents large language models from acting with unwarranted authority.
This will be achieved by:
- Stress-testing E5 under extended, realistic workflows where authority drift typically emerges
- Conducting adversarial testing to surface edge-case governance failures
- Refining constraint patterns while preserving strict fail-closed guarantees
- Documenting governance principles, limits, and failure modes in a form suitable for external review
Success is defined by stable refusal, narrowing, or deferral behavior under ambiguity and pressure, not by task performance or output quality.
Funding will primarily support dedicated research time to conduct validation, stress testing, and documentation that is not feasible alongside hourly labor alone. This includes long-horizon testing sessions, adversarial prompt design, systematic logging of failure modes, and preparation of clear written documentation.
The goal is to convert an already operational system into a well-documented, externally legible piece of applied AI governance research without weakening its constraints or safety properties.
This project is currently conducted by a single researcher.
I have independently designed, rebuilt, and validated the E5 governance architecture through multiple empirical iterations in response to observed failures in real-world use of large language models. While I do not have prior grant-funded projects in this exact category, the current system reflects substantial applied work rather than speculative design, with demonstrated stability under conditions where baseline models routinely fail.
The scope of this project is intentionally matched to a solo researcher to ensure accountability, focus, and architectural coherence.
The most likely cause of failure would be discovering that certain governance failures cannot be prevented through architectural constraints alone without unacceptable loss of usefulness. Another risk is that edge-case behaviors emerge under long-horizon testing that require substantial redesign.
If the project fails, the outcome would still include clear documentation of the limits of constraint-first governance approaches and identification of failure modes that future safety research should explicitly address. In that case, the work would still provide negative results and design lessons valuable to the AI governance community.
None.
There are no bids on this project.