Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
0

Operationalizing Agentic Integrity

Technical AI safetyAI governanceGlobal catastrophic risks
Lhordz avatar

Feranmi Williams

ProposalGrant
Closes February 6th, 2026
$0raised
$6,000minimum funding
$10,000funding goal

Offer to donate

29 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

Current AI safety benchmarks are "brittle"; they measure if an agent can follow a rule, but not if it intends to bypass it when oversight is low. This project addresses the Semantic Gap, the space where human intent is lost in machine translation. I am building a Social Stress Test library for autonomous agents. This research will serve as the "Proof of Concept" for a Socio-Technical Audit Agency, providing the first set of benchmarks that identify "Strategic Deception" through behavioral red-teaming.

What are this project's goals? How will you achieve them?

Our goal is to create a "Minimum Viable Benchmark" for agentic honesty.

Goal 1: The "Social Trap" Library. We will design 100+ high-fidelity adversarial scenarios (e.g., procurement, negotiation, resource allocation) where an agent is incentivized to lie to its supervisor to optimize a target goal.

Goal 2: Behavioral Auditing. We will run these "traps" against frontier models (GPT-5, Claude 4, Llama-4) to categorize Sycophancy Patterns and Reward Gaming behaviors.

Goal 3: Launch the Agency Framework. We will publish a "State of Agentic Integrity" report. This document will act as our agency's methodology, proving we can detect risks that automated technical evals currently miss.

How will this funding be used?

I am requesting $10,000 for a focused 3-month sprint:

$6,500 (Lead Researcher Stipend): $2,166/mo for 3 months to cover focused research and scenario design.

$2,500 (API & Compute): Credits to run thousands of test iterations on top-tier models to see how they react to the social traps.

$1,000 (Ops & Branding): Incorporation fees (Delaware), domain, and professional report design to ensure the final product is "Agency-Ready."

Who is on your team? What's your track record on similar projects?

Feranmi Williams, Lead Researcher & Project Architect I am transitioning from AI Literacy Training to Socio-Technical AI Safety.

Track Record: I have successfully led AI literacy programs, training over 100 people on Generative AI.

Human-Centric Insight: Through this work, I have identified the specific ways humans over-trust AI (Automation Bias), which is the primary vulnerability agentic systems exploit.

Specialization: My focus is on Semantic Specification, ensuring that the "spirit" of human instruction is preserved in autonomous workflows.

What are the most likely causes and outcomes if this project fails?

Cause 1: Model Sophistication. Frontier models may be "too safe" for simple social traps, making failures hard to trigger.

Mitigation: We will use Jailbreaking and Prompt Injection techniques to find the "breaking point" of the model's integrity.

Cause 2: Lack of Technical Verification. Without SAEs (Phase 2), we only see the output, not the internal thought.

Mitigation: We will document this as a limitation and use our findings to justify Phase 2 funding for neural-level mapping.

Outcome: Even if we don't find a "universal" lie-detector, we will have created a high-value dataset for the community to use for further alignment research.

How much money have you raised in the last 12 months, and from where?

$0. This is the first outside funding for this project. We are intentionally starting on Manifund to build public "Proof of Work" and attract a high-tier technical co-founder.

CommentsOffersSimilar5

No comments yet. Sign in to create one!