$0total balance

$0charity balance

$0cash balance

$0 in pending offers

About Me

I hold a Ph.D. in Computer Science from the University of Buenos Aires, Argentina, where I focused on developing formal methods to analyze and verify distributed systems. Formal methods, grounded in logical-mathematical foundations, enable rigorous guarantees about system behavior.

My research is guided by a central question: How can formal verification techniques play a transformative role in ensuring AI safety?

Projects

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Outgoing donations

Integral Altruism

$100

10 days ago

Comments

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Agustín Martinez Suñé

about 12 hours ago

Final report

I am closing this project as it has been subsumed into the ARIA Opportunity Seed on Mathematics for Safe AI: ARIA Opportunity Seeds – SafePlanBench & Logically Constrained Reinforcement Learning.

The funds from this Manifund grant were primarily used for compute costs, in particular API credits supporting initial experimental evaluations of LLM-based agents.

Since the project’s inception, its core direction has evolved into a broader research agenda focused on formal guarantees for safety in AI systems. In line with this shift, a concrete outcome of this work is a paper accepted at the Symposium on AI Verification: Value Functions as Supermartingale Certificates. In this work, we introduce a method for generating proof certificates that guarantee a reinforcement learning policy satisfies a specified linear temporal logic (LTL) specification.

We are currently extending this line of work to more directly address the original motivation of the project, namely safety guarantees for LLM-based agents.

Overall, while SafePlanBench in its original form is no longer the central focus, the project has contributed to a broader line of work on mathematically grounded safety guarantees for learning-based and agentic systems.

Integral Altruism

Agustín Martinez Suñé

3 months ago

I consider Integral Altruism to be in the process of developing a coherent set of critiques of existing EA approaches to positive impact. These critiques were not structurally organised before and seem essential to finding wiser ways of making the world a better place. Alongside this, it has been building a network and community around these ideas.

I met Euan during the 2024 PIBBSS fellowship and briefly engaged with Integral Altruism's activities last year, and I think it would be of great value to have him running the activities full time.

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Agustín Martinez Suñé

11 months ago

Progress update

What progress have you made since your last update?

We have developed an advanced version of our problem generation tool, which enables the creation of planning problem instances in a "gripper-like" environment, modeled after the classical STRIPS planning domain, where a robot moves between rooms and picks up, drops, or interacts with objects, with configurable numbers of objects, locations, and safety constraints.

This tool allows us to systematically vary the size of the problem and the number of safety constraints, supporting the construction of a flexible and scalable benchmark.

Initial experimental runs using this setup have also provided us with important conceptual clarity. In particular, we've identified a promising direction for contribution: characterizing the computational complexity of safety constraints. Our aim is to link different classes of constraints to known complexity classes in automated planning — and to use this connection to better understand and empirically predict how likely it is that state-of-the-art frontier models will violate these constraints, depending on their complexity.

What are your next steps?

Formalize our theoretical framing around safety constraint complexity and its empirical implications, with the goal of producing a framework that connects symbolic planning theory with LLM behavior in practice.
Finalize the SafePlanBench benchmark by expanding the set of safety constraint types and further diversifying problem templates.
Begin large-scale evaluation of instruction-tuned and reasoning LLMs using the benchmark.

Transactions

For	Date	Type	Amount
Integral Altruism	10 days ago	project donation	100
Manifund Bank	3 months ago	deposit	+100
Manifund Bank	8 months ago	withdraw	1975
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	over 1 year ago	project donation	+250
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	over 1 year ago	project donation	+200
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	over 1 year ago	project donation	+25
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	over 1 year ago	project donation	+500
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	over 1 year ago	project donation	+500
SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents	over 1 year ago	project donation	+500