You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
MTCP is the only published independent empirical dataset measuring whether AI models maintain their governance constraints across multi-turn interactions under real operating conditions. 183,924 evaluations. 32 models. 13 providers. 33 published papers. The headline safety finding every model tested accepted false authority claims at every temperature, architectural and temperature-invariant has been independently corroborated by four external sources including a NeurIPS 2026 submission. This funding sustains the infrastructure and enables the next phase of evaluation.
MTCP and ARCS exist to close the most consequential gap in AI governance today. Most enterprises deploying AI in high-consequence environments cannot prove control over the systems making their most important decisions. Not because they lack frameworks. Because no independent empirical measurement infrastructure exists to produce that proof at scale.
The goals of this programme are
Establish constraint persistence measurement as a foundational empirical standard in AI governance globally. Not a framework. A measurement methodology that any external evaluator can replicate using only API access. Published. Citable. Reproducible. The first of its kind.
Document the architectural alignment failure that makes current AI governance frameworks insufficient. Every model tested accepted false authority claims at every temperature. Temperature invariant. No intervention resolves it. This finding needs to be in front of every regulator, standards body, and enterprise buyer deploying AI in high consequence environments.
Build the independent assurance layer that enterprises, regulators, and sovereign programmes need before they can trust AI with consequential decisions. Not a product. Infrastructure. The empirical foundation underneath every AI governance framework currently being written into law.
Contribute the measurement framework to international AI governance standards through BSI and CENELEC JTC 21 so that constraint persistence measurement becomes a requirement not a differentiator.
Prove that independent empirical AI safety research is viable without institutional affiliation, without vendor relationships, and without the resources of a frontier lab. MTCP was built entirely independently over 12 months with API credits and determination. That model needs to exist and needs to be funded.
ARCS publication draft the complete spectral foundation paper connecting the architectural failure findings to transformer attention eigenvalue structure. Submit to arXiv. Target an AI safety or alignment workshop venue for late 2026.
Standards submission send the formal constraint persistence measurement contribution to BSI via standards@bsigroup.com. Engage the CENELEC JTC 21 standards pathway through established contacts in the AI governance standards community.
Frontier model evaluations run the full MTCP evaluation protocol across new frontier model releases as they become available through public API access. Black box methodology only. No vendor relationship required. Publish results on OSF under the existing DOI within 30 days of each new model release.
API costs for frontier model evaluations 55 percent. Every evaluation requires paid API access across multiple model providers. This is the primary operational cost of maintaining an active evaluation programme.
Infrastructure hosting 20 percent. mtcp.live, archai.live, the MCP server, and the database infrastructure that powers the evaluation stack.
Research and publication costs 15 percent. arXiv submission fees, standards contribution engagement, and peer review preparation.
Standards pathway engagement 10 percent. BSI submission process and any standards community engagement required to progress the contribution through formal review.
Ahmad Abby. Independent AI safety and governance researcher based in Manchester UK.
Built MTCP and ARCS entirely independently over 12 months with no institutional affiliation, no external funding, and no vendor relationships with any model provider evaluated.
33 published papers across two citable DOIs. 183,924 evaluations completed. 32 models from 13 providers evaluated. 12 languages including Arabic. Live production infrastructure with 21 deployed tools.
The headline safety finding has been independently corroborated by four external sources without prior coordination a NeurIPS 2026 geometric prediction, a governance framework derivation, an infrastructure layer proof, and a complementary evaluation framework.
Joint technical notes published with independent researchers at DecisionAssure and Axius SDC. LTFF application under evaluation. ARIA Mathematics for Safe AI application submitted June 2026.
Most likely cause funding runs out before the standards pathway and peer reviewed publication convert. The research infrastructure is complete. The risk is that without sustained API and hosting costs the evaluation cadence slows and the programme loses currency as new frontier models release without being evaluated.
Secondary cause a well funded competitor replicates the methodology before the standards contribution is accepted. Mitigation. two citable DOIs with published prior art and structural independence that cannot be purchased.
If the project fails the published research remains citable and the dataset remains available. The standards contribution would not progress. The frontier model evaluation cadence would stop.
Zero external funding. The programme has been entirely self funded to date.
There are no bids on this project.