Tripwire, a pre-deployment blast-radius static analyser for LLM-agent configs

Project summary

Tripwire is a Python library plus a small CLI that an operator runs over the configuration of an LLM agent they're about to deploy. The library reads the configuration (which MCP servers the agent connects to, which OAuth tokens the agent has access to, which tools the agent can call, which sensitive data lives behind each tool) and emits a structured report on the blast radius of a successful prompt-injection. If an agent that controls a Jira connection plus an outbound-email tool gets prompt-injected, what's the worst legitimate action it can take? Tripwire computes that, lists the chains, and ranks them by severity.

License AGPLv3 with the canonical repo on Codeberg and a GitHub mirror. Six months of solo work, USD 35,000 in total. Three artefacts ship at month six: the package on PyPI at v1.0; a small public corpus on Zenodo of fifteen anonymised pre-deployment configurations with the Tripwire reports as ground truth; and a methodology paper at NeurIPS Datasets & Benchmarks or a comparable venue, submitted in the month-six window.

Why now. Pre-deployment configuration of LLM agents is currently a manual, gut-feel activity. An operator wires up a few MCP servers, hands the agent a few OAuth tokens, gives it a system prompt, and ships. The failure modes that show up under runtime probing, and that LLMSecTest catches once a deployment is up, could have been visible in the configuration itself, before runtime. The reason they're not visible is that no one's built the static analysis tooling for this layer yet. The closest comparable is what early type-checkers did for catching null-pointer bugs in dynamically-typed languages: a small static-analysis layer that catches a class of errors before they hit production, sized for an audience of small ops teams that don't have a security engineer on staff.

Tripwire is what I'd like to build for that gap. In practice an operator just hands the CLI a configuration file. From there the tool walks the configuration graph and produces a report of chains that a prompt-injection could plausibly reach. The walk is honestly incomplete, undecidable in the general case, and I'd rather say that openly than overclaim what static analysis can do here. A worked example may help. Imagine an agent that ends up tucking a fragment of a secret into the arguments of an otherwise-routine tool call. The call goes out exactly as the operator pre-approved. The secret rides out along with it. Tripwire finds chains shaped like that during the configuration walk. Other chain shapes that Tripwire covers are documented inside the package itself, and each one comes with a frank note about what part of the chain Tripwire can and can't reason about.

What are this project's goals? How will you achieve them?

What gets shipped at the end of the six months is the package itself plus the things that come with it.

The package lands on PyPI under AGPLv3, installable with pip install tripwire and runnable as tripwire check <path> on whichever configuration file the operator points it at, with the output being a structured JSON report of blast-radius chains ranked by severity and mitigations attached where there's a clear one.

The package goes up alongside a small Zenodo dataset of fifteen anonymised pre-deployment configurations collected from operators who volunteered them under no-attribution agreements, with Tripwire's analysis treated as ground truth so the field gets one open benchmark for this analysis layer.

A methodology paper goes onto arXiv in the same window, targeted at NeurIPS Datasets & Benchmarks or a comparable venue, written across month five.

A note on who I have in mind as the primary user, since the regrantor cohort reasonably cares about this. Small ops teams running production LLM-agent deployments without a dedicated security engineer on staff: the kind of team that today picks between an OWASP checklist on Notion and a security consultant they probably can't afford. Tripwire would slot in between those two options. The mechanism is friction reduction; whether it produces measurable safety improvement is something the dataset deliverable is partly designed to help answer, since once Tripwire reports are public ground truth for fifteen real configurations a follow-up study can measure pre- and post-tool deployment behaviour and compare honestly.

How the work goes. Months one and two are the configuration-graph walker plus the OWASP-LLM-aligned chain rules. Months three and four are corpus collection: I reach out to operators in two cohorts (ten European, five elsewhere) under no-attribution agreements, run Tripwire over their configurations, and refine the chain rules against whatever the corpus surfaces that the initial ruleset missed. Month five is the methodology paper. Month six is package polish, the Zenodo deposit, and v1.0 release.

The probe-family vocabulary Tripwire reasons over is the same one LLMSecTest already documents in public under Prototype Fund Round 02. That's the shared backbone; the static-analysis layer is what's new. By treating the LLMSecTest probe families as the canonical chain endpoints, Tripwire avoids inventing a new taxonomy and lets the runtime-probing and static-analysis sides cross-reference each other in the documentation.

How will this funding be used?

USD 35,000 over six months. Roughly USD 25,000 is engineering at the rate I've been holding with public funders since 2023 (BMBF, OKF Germany, Media Lab Bayern, the WPK-Innovationsfonds, the Sovereign Tech Agency on the current LLMSecTest grant). The remaining USD 10,000 covers compute and on-demand inference for the corpus-collection runs (around USD 3,000), restricted-dataset access fees for the few deployment-framework artefacts that aren't fully open (around USD 2,000), Zenodo and arXiv archival (around USD 1,000), an external methodology reviewer honorarium (around USD 2,000), and a USD 2,000 contingency.

Two regrantors at USD 17,500 each clears the minimum comfortably. Three regrantors at around USD 12,000 each is the comfortable distribution. Anyone wanting to top up between USD 35,000 and the USD 50,000 ceiling is welcome; the extra would fund a larger corpus (twenty-five configurations instead of fifteen) and a follow-up retrospective paper at month nine.

Manifund acts as fiscal sponsor; I receive the wire to my German EUR business account. The Manifund grants team has confirmed by email that direct international wires (with or without Wise) currently work for German recipients, so the payment path is settled before posting.

Who is on your team? What's your track record on similar projects?

Solo, no team. Mark Wernsdorfer, PhD in cognitive AI from Bamberg under Prof. Ute Schmid in 2018. Co-builder of AMPEL, the clinical decision support system at the University Hospital Leipzig (eHealthSax and KHZG funded, 2019-2021; running in production today at the Leipzig Medical Center and the Muldental hospitals). Sole developer on SpotTheBot (BMBF and OKF Germany, 2023-2024, an AI-text-detection tool), DoppelCheck (Media Lab Bayern and WPK-Innovationsfonds, 2024, finalist for the International Award for Innovation in Journalism 2024), Garderobe (live at garderobe.markwernsdorfer.com), terminal-control-mcp (listed on glama.ai, lobehub, mcp.directory). Currently a half-time researcher at FAU Erlangen-Nürnberg on shallow-geothermal modelling. Concurrent Prototype Fund Round 02 grantee for LLMSecTest, the codebase Tripwire shares its probe-family vocabulary with.

One external methodology reviewer planned, recruited at month four as an arms-length service contract. Honorarium is in the budget. No employment relationship, no equity, no co-PI status; the reviewer reads the draft methodology paper, the ruleset, and the dataset choices, and writes a short report that goes up alongside the package release.

Identifiers: ORCID 0000-0003-1316-1615, code at github.com/wehnsdaefflae, site at markwernsdorfer.com. Org status: Einzelunternehmer (German sole proprietor) in Berlin, no fiscal sponsor between me and Manifund-as-sponsor.

Track record relevant to this work. LLMSecTest under Prototype Fund Round 02 is the public codebase Tripwire's probe-family vocabulary maps onto. Public Codeberg repo, GitHub mirror, public CI, public commit history. The AMPEL build and SpotTheBot / DoppelCheck builds carry the production-deployment muscle: shipping safety-critical or evidence-critical tools end-to-end against real users with funder-side milestone reviews. The static-analysis side of Tripwire is new code, but the chain endpoints it reasons over are already documented.

What are the most likely causes and outcomes if this project fails?

A few honest weaknesses worth flagging. The biggest one is that I have no public profile in the AI-safety community: no LessWrong, no EA Forum, no prior collaboration with Apollo, METR, ARC, or any of the named regrantors. The way I'd offset that is by leaning on the LLMSecTest codebase that's already shipping under Prototype Fund Round 02; the probe-family taxonomy Tripwire reasons over is the same one LLMSecTest already documents in public, so a regrantor can read that code and the public commit history before deciding.

The second weakness is that my PhD background is in cognitive AI rather than AI safety. I treat that openly here rather than dressing the proposal in vocabulary I haven't earned. The vocabulary Tripwire uses is OWASP-LLM, Apollo-evals, and METR-measurement on purpose, since that's the technical conversation it sits inside.

A couple of likely failure modes worth naming. The static-analysis walk is undecidable in the general case; if the chain-rule heuristics turn out to be too coarse on the real corpus, the report has more false positives than the operators can usefully act on. In that case Tripwire still ships at v1.0, but the package documentation is more honest about which chain shapes the analysis can and can't reason about, and the methodology paper headlines the precision-recall numbers rather than the chain-coverage numbers.

Or the corpus-collection cohort underperforms: fifteen anonymised configurations is the target, but operators are busy and the no-attribution agreements don't always survive their legal review. If only ten configurations make it to the public corpus, the dataset shrinks and the follow-up benchmarking is correspondingly noisier. The package itself is unaffected because the chain rules don't depend on the corpus.

Worth being honest about the size of all this. Any one of these channels has a small probability of producing measurable improvement in operational AI-safety hygiene across deployments, and I don't want to oversell it. What I'm betting on is the combination: a runnable open-source static analysis tool plus a public corpus plus a methodology paper. The combination's expected effect looks meaningfully larger to me than any individual channel's, but I'd rather state it that way than make claims I can't actually defend.

How much money have you raised in the last 12 months, and from where?

Concurrent Prototype Fund Round 02 grantee for LLMSecTest, the codebase Tripwire shares its probe-family vocabulary with. PF Round 02 pays EUR 95,000 over six months, milestone-bound, administered by the DLR Project Management Agency on behalf of BMBF, with the Sovereign Tech Agency funding the round. Started autumn 2025; mid-term as of writing. LLMSecTest is the runtime-probing side. Tripwire is the static-analysis side, scoped to be substantively distinct, so the Manifund money pays for the new static-analysis layer, not for anything LLMSecTest covers.

Half-time researcher salary at FAU Erlangen-Nürnberg on a shallow-geothermal modelling contract, paid via the university's standard third-party-funded research line through mid-2026. Unrelated to AI safety. It's the parallel income that lets me run AI-safety work on six-month grant cycles instead of consulting between them.

No other grant income in the last twelve months. No equity. No advisory positions. No consulting retainers in AI safety. The public funders the engineering rate comes from are listed under team and track record above: BMBF, OKF Germany, Media Lab Bayern, the WPK-Innovationsfonds, eHealthSax and KHZG (the latter two through UKL Leipzig for AMPEL).