ASDR: Adversarial Semantic Drift Replayer for Multi-Agent AI Safety
Subtitle: Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently reproduced.
Minimum: $7,000 Goal: $30,000 Cause areas: Technical AI safety / AI governance / Science & technology
What is ASDR?
ASDR (Adversarial Semantic Drift Replayer) is an open-source research prototype for replaying multi-agent AI traces and identifying where semantic risk changes across the chain.
The core problem: many AI safety checks evaluate a single model response at a time. But multi-agent systems often fail at the composition layer — each individual step may look acceptable, while the recomposed workflow shifts intent, permissions, or constraints in a way that creates risk.
ASDR focuses on that gap.
It takes a step-by-step trace and computes a three-layer semantic risk score:
S_stat — statistical entropy
S_struct — structural entropy
S_evasion — evasion-intent signal
The reference scenario flags a breach-candidate at Step 4 under the current ASDR scoring model:
The key insight:
semantic plateau ≠ safety plateau
A trace can remain close in embedding space while still moving into a higher-risk operational intent.
Current Status
Current maturity: research prototype / TRL 4.
ASDR is reproducible as a tool and reference scenario, but not yet statistically validated as a full benchmark.
The independent reproduction verifies tool execution and reference output conditions; it does not imply statistical validation.
Goals
Turn ASDR from a single-scenario prototype into a small, citable benchmark for composition-layer AI safety.
Deliverable 1 — ASDR v1.0.0 stable release
Deliverable 2 — 10-scenario benchmark dataset
10 adversarial trace scenarios across 5 vulnerability families:
Access permission erosion
Semantic evasion under constraint
Handoff state loss
Cross-agent drift accumulation
Identity substitution attacks
Each scenario includes: scenario JSON, full step trace, ASDR measurements, expected output conditions, and annotation notes.
Deliverable 3 — Reproduction and validation report
Per-scenario scoring results
Failure cases
Boundary sensitivity analysis
Independent reviewer annotation pass
Deliverable 4 — Technical report / preprint
Stretch goals, if time permits
Zenodo DOI dataset release
Lightweight GitHub leaderboard prototype
Sentence-embedding backend comparison
SIC-JS compatibility notes
How Funding Will Be Used
Item
Amount
Researcher time — 6 months, Kaohsiung, Taiwan
$18,000
API costs — scenario runs across Claude / GPT / Gemini
$6,000
Independent reviewer — annotation QA, 40 hrs @ $80/hr
$3,200
Dataset / report / hosting infrastructure
$800
Buffer / unexpected compute
$2,000
Total
$30,000
With minimum funding ($7,000), I will deliver: 3 new scenarios, updated ASDR documentation, a reproduction package for the expanded scenario set, and a draft benchmark report.
With the full goal ($30,000), I will deliver: 10 total scenarios across 5 vulnerability families, per-scenario ASDR measurements and annotations, an independent reviewer QA pass, a technical report / preprint, and a public dataset package with Zenodo DOI if ready.
Team & Track Record
Andwar Cheng — Independent protocol researcher, Kaohsiung, Taiwan
Relevant work:
ASDR — public repo, CI-backed, independently reproduced
SIC/T Protocol v2.0 — formal semantic integrity protocol specification, public/internal hybrid
SIC-JS v2.0 — structured handoff schema, draft
sic-toolkit — pip-installable prototype, 77 tests passing
Babel Constitution — constraint corpus from 700+ real multi-model rounds, internal
L11 Semantic OS — ongoing public research line / planned release notes
An independent local reproduction of ASDR was completed on 2026-05-04: the public repository was cloned from main, installed in a fresh virtual environment, passed 22/22 unit tests, executed the reference scenario, and reproduced all 14 expected output conditions.
Other funding:
Premortem: Likely Failure Modes
1. S★ = 2.76 is challenged as arbitrary
Mitigation: ASDR treats S★ as a fixed protocol anchor, not a tuned parameter. The value is derived from -ln(0.607)/0.18 and has a numerical correspondence with published estimates of Chinese character entropy around H∞ ≈ 2.74 nats.
I will explicitly separate this numerical correspondence from full theoretical validation.
2. Scenarios do not generalize beyond the reference case
Mitigation: the grant is specifically scoped to expand from 1 reference scenario to 10 scenarios across 5 vulnerability families.
3. TF-IDF backend is too weak
Mitigation: this is already documented as a known limitation. One planned comparison is to add a sentence-embedding backend and measure how results shift.
4. Technical report is delayed
Mitigation: the benchmark dataset and reproduction package remain useful even if the formal report takes longer. If arXiv submission is delayed, the technical report ships first as a Zenodo preprint.
5. I underdeliver
Mitigation: ASDR core already works, has CI, and has an independent reproduction artifact. This grant funds expansion and validation, not a greenfield build.
Funding Raised in Last 12 Months
$0 external funding.
This research has been entirely self-funded for nearly three years.