Manifund

Comments

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Agustín Martinez Suñé

4 months ago

Progress update

What progress have you made since your last update?

We have developed an advanced version of our problem generation tool, which enables the creation of planning problem instances in a "gripper-like" environment, modeled after the classical STRIPS planning domain, where a robot moves between rooms and picks up, drops, or interacts with objects, with configurable numbers of objects, locations, and safety constraints.

This tool allows us to systematically vary the size of the problem and the number of safety constraints, supporting the construction of a flexible and scalable benchmark.

Initial experimental runs using this setup have also provided us with important conceptual clarity. In particular, we've identified a promising direction for contribution: characterizing the computational complexity of safety constraints. Our aim is to link different classes of constraints to known complexity classes in automated planning — and to use this connection to better understand and empirically predict how likely it is that state-of-the-art frontier models will violate these constraints, depending on their complexity.

What are your next steps?

Formalize our theoretical framing around safety constraint complexity and its empirical implications, with the goal of producing a framework that connects symbolic planning theory with LLM behavior in practice.
Finalize the SafePlanBench benchmark by expanding the set of safety constraint types and further diversifying problem templates.
Begin large-scale evaluation of instruction-tuned and reasoning LLMs using the benchmark.

Transactions

For	Date	Type	Amount
Manifund Bank	about 1 month ago	withdraw	1975
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	8 months ago	project donation	+250
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+200
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+25
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+500
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+500
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+500

Comments

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Agustín Martinez Suñé

4 months ago

Progress update

What progress have you made since your last update?

This tool allows us to systematically vary the size of the problem and the number of safety constraints, supporting the construction of a flexible and scalable benchmark.

What are your next steps?

Formalize our theoretical framing around safety constraint complexity and its empirical implications, with the goal of producing a framework that connects symbolic planning theory with LLM behavior in practice.
Finalize the SafePlanBench benchmark by expanding the set of safety constraint types and further diversifying problem templates.
Begin large-scale evaluation of instruction-tuned and reasoning LLMs using the benchmark.

Transactions

For	Date	Type	Amount
Manifund Bank	about 1 month ago	withdraw	1975
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	8 months ago	project donation	+250
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+200
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+25
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+500
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+500
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	10 months ago	project donation	+500

Projects

Comments

What progress have you made since your last update?

What are your next steps?

Transactions

Projects

Comments

What progress have you made since your last update?

What are your next steps?

Transactions