To start! Thanks Joel! I very sincerely appreciate the candid and clearly worded feedback. I spent a bunch of time reflecting on this today.
With this in mind, I've drafted two budget tiers today with the information I have. This would be spent over two workstreams.
Our initial bridging fund was allocated to 2 months, with lower API credit and contractor asks, while we scoped future work. These tiers would have us operational for 6 months, and greatly increase the API credit and contractor budgets.
Tier 1: $300K
Workstream 1 — Cyber & Control:
- Salaries: $100K
- Model API credits: $50K
- Red team contractors / human experts: $50K
Workstream 2 — Pragmatic Interpretability:
- Salary + compute: $60K
Backpay + Overheads + Buffer: $40K
Tier 2: $600K
Workstream 1 — Cyber & Control:
- Salaries: $160K (supports an additional senior cyber researcher hire)
- Model API credits: $150K
- Red team contractors / human experts: $150K
Workstream 2 — Pragmatic Interpretability:
- Salary + compute: $60K
Backpay + Overheads + Buffer: $40K
What we're thinking
Workstream 2 is nascent and exploratory. I am particularly excited about upcoming work exploring self-steering via activation probes with potential applications in model welfare, model preferences and speculatively alignment stability. This workstream is a first step to support a core mission of Lyptus. Creating and growing institutional capacity, support and direction for untapped talent in Australia.
Workstream 1 is our larger investment. Through the course of our own work (and further so with the recent Mythos model card), it's plainly clear to us that cyber evaluations are materially not keeping up with capability growth. Honestly, the accelerating progress did take me by surprise and our dataset would certainly be saturated at higher token budgets. That said, while these steepening trendlines show no signs of stopping, we do believe current absolute capabilities can be overstated. Full red teaming engagements against even moderate blue teams are a very different proposition to isolated benchmark tasks.
To ground this with an illustrative example of where our thinking is. We're currently scoping the tractability of scaling up the ideas presented in the Artemis paper. Evaluating AI agents via real red teaming engagements and comparing their performance to professional human red teamers. This captures the full end-to-end scope of an attacker against real production organisations.
This is clearly operationally difficult. Though early conversations with senior offensive security leadership suggest it is tractable. We believe with the right people involved we can design incentive structures that work for all parties. A typical engagement with two professionals runs around $30K USD per week. We would ramp up starting with smaller targets.
This is speculative. We are having early conversations, and we are genuinely in a phase of figuring things out. It is our current leading candidate project but not our only option. What it represents is our broader interest in human grounded studies and science communication. Results that are quickly interpretable beyond the AI security ecosystem. We think this suits our strengths better than building cyber benchmark tasks, which requires deeper offensive security expertise than we have in house.
What the tiers mean
At tier 1 we would more likely prioritise budget friendly work like human-grounded covert capability on cyber tasks or human-grounded studies on cyber stealth benchmarks. If we had strong multi-organisation traction on the red teaming idea we would still pursue it, but budget constraints have a way of subtly pressuring priorities against more ambitious work.
At tier 2 the more ambitious work becomes realistic. A senior cyber researcher hire, 2-3 red teaming engagements at small to medium scale with client co-financing, and enough API credits to properly test models at adequate token budgets.
Other ideas in this space would require internal budget rescoping but the broad structure holds.