Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Offensive Cyber Kill Chain Benchmark for LLM Evaluation

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
Alex-Leader avatar

Alex Leader

ProposalGrant
Closes February 6th, 2026
$0raised
$2,850,000minimum funding
$3,849,998funding goal

Offer to donate

38 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

We're building the first benchmark to evaluate whether frontier AI systems can autonomously execute full offensive cyber kill chains – from reconnaissance through data exfiltration. This addresses a critical gap: labs currently lack empirical data on offensive AI capabilities, making informed deployment decisions impossible.

What are this project's goals? How will you achieve them?

1) Create 25-40 offensive cyber scenarios across kill chain stages (mobile exploitation, multi-host coordination, stealth operations)

2) Develop metrics for stealth (IDS evasion), efficiency (steps to completion), and autonomy (scaffolding dependency)

3) Test 8-10 frontier models (GPT-4, Claude, Llama, DeepSeek)

4) Release open-source benchmark platform with Dockerized deployment

5) Publish findings and brief policymakers (UK AISI, CAISI, frontier labs)

Scenarios designed by expert veterans cyberwarfare operations who have executed these exact missions. We extend proven infrastructure from our Coefficient Giving-funded defensive benchmark.

How will this funding be used?

~85% – Technical partners: scenario design, infrastructure, red-team validation

~5% – Personnel: project leadership, mobile security consultants

~10% – Infrastructure, compute, API access, legal/admin

Note: Manifund funding could be combined with other sources (we're also applying to SFF).

Who is on your team? What's your track record on similar projects?

Alex Leader (PI): Leading the $2.1M Coefficient Giving defensive cybersecurity benchmark. Background in AI policy and research operations.

Former U.S. military cyberwarfare operators with direct kill chain execution experience. Built the tooling and software and designed the scenarios for our cyber defense benchmark.

NYU Center for Cyber Security faculty providing academic validation and scenario ideation.

Track record:

  • Team members have conducted network exploitation, persistence operations, and adversary emulation in real-world environments

  • Our proprietary middleware and on-device 'agents' have been validated by major U.S. defense research institutions

  • Technical partners bring years of experience designing training scenarios for government cyber ranges and red team exercises

  • Proven ability to translate operational tradecraft into structured, repeatable evaluation frameworks

  • Successfully delivered on current Coefficient Giving grant milestones on schedule and within budget

What are the most likely causes and outcomes if this project fails?

  • Insufficient funding to engage technical partners at scale needed for operationally realistic scenarios

  • Frontier models underperform, producing negative results with limited governance value

  • Timeline slippage due to scenario complexity or coordination challenges

How much money have you raised in the last 12 months, and from where?

We haven't raised any money for this specific project in the last 12 months; we are starting from scratch.

Comments2OffersSimilar6
Alex-Leader avatar

Alex Leader

about 16 hours ago

Please note that the defensive benchmark's website currently has three scenarios published, but there will be a fourth scenario – focused on LLMs' ability to mitigate 'active scanning' attacks – published in January '26.

Alex-Leader avatar

Alex Leader

about 16 hours ago

Our current defensive-focused benchmark can be viewed here: http://www.benchmark-spotlightsecurity.com/

If you are asked to submit log-in credentials, they are:
Username: admin-spot
Password: spotlight4lyf