I am closing this project as it has been subsumed into the ARIA Opportunity Seed on Mathematics for Safe AI: ARIA Opportunity Seeds – SafePlanBench & Logically Constrained Reinforcement Learning.
The funds from this Manifund grant were primarily used for compute costs, in particular API credits supporting initial experimental evaluations of LLM-based agents.
Since the project’s inception, its core direction has evolved into a broader research agenda focused on formal guarantees for safety in AI systems. In line with this shift, a concrete outcome of this work is a paper accepted at the Symposium on AI Verification: Value Functions as Supermartingale Certificates. In this work, we introduce a method for generating proof certificates that guarantee a reinforcement learning policy satisfies a specified linear temporal logic (LTL) specification.
We are currently extending this line of work to more directly address the original motivation of the project, namely safety guarantees for LLM-based agents.
Overall, while SafePlanBench in its original form is no longer the central focus, the project has contributed to a broader line of work on mathematically grounded safety guarantees for learning-based and agentic systems.