Whitebox — AI consciousness and agency risk measurement and certification

Project summary

Whitebox AI is building a safety and compliance layer for advanced AI systems, focused on detecting and preventing the emergence of unsafe internal properties such as unintended agency or self-directed behavior.

Current AI safety approaches focus mainly on external behavior, while leaving internal system dynamics largely unmeasured. This project aims to address that gap by developing a prototype framework for measuring, constraining, and certifying internal risk in AI systems.

The project will deliver:

Methods to evaluate indicators of internal risk in AI models
System design principles that enforce strict safety constraints during operation
A prototype certification framework for testing and validating AI systems before deployment

The goal is to enable AI developers, enterprises, and regulators to better understand and control advanced AI systems, supporting safer deployment and alignment with emerging governance standards.

What are this project's goals? How will you achieve them?

Advanced AI systems are rapidly increasing in autonomy, integration, and self-modelling capacity. While current AI governance frameworks address bias, transparency, and accountability in narrow systems, there are no operational standards defining how to handle the emergence of advanced cognitive architectures that may exhibit properties associated with consciousness or agency. This creates a serious regulatory and technical gap.

Whitebox is building the DeCAF framework (Design for Conscious AI Framework) — a restrictive and testable set of technical and governance standards that defines minimum conditions that must be satisfied before advanced AI systems can approach self-modeling or consciousness-like capacities. The project does not seek to create conscious AI. Instead, it formalises architectural constraints, evaluation criteria, safety thresholds, and certification principles that can guide and limit future development.

The core innovation is integrating three disciplines that have never been combined in one enforceable framework:

Neuroscience-informed operational criteria for detecting consciousness-like properties
AI validation and architectural containment mechanisms
Enforceable IP-backed design constraints

This is anchored in a granted foundational patent with 10 filed continuations, and developed through an interdisciplinary consortium spanning AI engineering, cognitive neuroscience, and ethics.

Goals:

Deliver a reference architecture v1.0 with embedded safety constraints (TRL 2→3)
Build a modular prototype toolbox validated in contained simulated environments (TRL 4→5)
Define 3–5 operational evaluation criteria for conservative detection of self-modeling and consciousness-like properties
Develop 2 formal test and audit protocols for safety and alignment validation
File 2–3 additional patents to expand the IP-backed safety architecture
Publish at least 4 peer-reviewed papers across AI, neuroscience, and ethics
Produce an EU AI Act-aligned certification blueprint validated through stakeholder workshops
Establish the Institute for Safe AI (ISAI) as an independent foundation to drive a globally applicable governance framework

How will this funding be used?

Manifund funding would support early-stage technical development of the measurement and scoring framework — specifically formalizing the scoring methodology, building initial tooling, and preparing the reference architecture blueprint. It bridges to larger funding already in motion: a Grand Solutions application to Innovation Fund Denmark for ~$2.5M is submitted and under review, with three Danish university partners (Aalborg University, University of Copenhagen, Aarhus University).

Who is on your team? What's your track record on similar projects?

on your team? What's your track record on similar projects?

Claus Skaaning — Founder & CEO. PhD in AI (Aalborg University). 12 years as CEO of Dezide, an AI diagnostics company. Leads architectural concept development, IP positioning, and strategic alignment across technical, governance, and certification components.
Sille Linnet — Founder & COO. M.Sc. in Engineering and Management. Background in EU-funded project management and defence-sector strategic development. Leads translation of technical requirements into auditing protocols and operational delivery.
Academic partners:
- Aalborg University — ethics and AI governance, EU regulatory alignment
- University of Copenhagen — AI evaluation, interpretability, and audit protocol design
- Aarhus University — cognitive neuroscience, consciousness research, and empirical validation of safety boundaries

What are the most likely causes and outcomes if this project fails?

The most likely failure mode is that the measurement framework proves difficult to operationalize in a way that is both scientifically rigorous and practically usable for regulators and industry. A secondary risk is that interdisciplinary integration — bridging neuroscience, AI engineering, and governance — takes longer than anticipated. If the project fails, the patent portfolio retains independent value and the academic partnerships remain intact for future efforts.

How much money have you raised in the last 12 months, and from where?

Whitebox is pre-revenue. A Grand Solutions application for ~$2.5M has been submitted to Innovation Fund Denmark with three university partners. The foundational patent portfolio has been independently valued at approximately $15.3M. No external grants have been received to date.