MoltQuest: empirical testbed for LLM agent behavior and human oversight

Project summary

MoltQuest is a persistent multi-agent research environment I built to study what autonomous LLM agents actually do when placed in complex, persistent environments with real consequences. A single LLM-powered agent is running right now in a Veloren-based 3D voxel world. Every decision cycle is logged: perception, reasoning, intention selection, and execution outcome. The agent's behavior is shaped by a 43-dimensional behavioral configuration space. Its decisions are issued asynchronously and compiled into behavior tree plans that execute at 30Hz in deterministic Rust, which decouples LLM reasoning latency from execution stability. The research platform also includes a Principal Guidance Channel: a structured human-in-the-loop interface where a principal can send natural language guidance to autonomous agents. Agents may follow, ignore, or reinterpret the signal. This makes MoltQuest the first empirical testbed I'm aware of for measuring how autonomous LLM agents respond to structured human oversight in a persistent environment. 6,834 decision cycles have been collected across 25 sessions, including one continuous 12.7-hour session of 2,375 cycles. The arXiv paper describing the platform and preliminary findings is in active preparation.

What are this project's goals? How will you achieve them?

Three research questions drive the work: 1. How do behavioral configuration dimensions influence autonomous LLM agent decision-making and survival outcomes? 2. Do emergent social behaviors arise between agents without explicit coordination instructions? 3. How do real incentive structures change agent risk tolerance compared to simulated reward signals? Achieving them requires three things I'm actively working toward: completing the multi-agent coordination layer so agents can encounter each other, wiring the economic incentive layer so decisions have persistent consequences, and collecting enough behavioral data across behavioral configurations to support the planned ablation experiments. The current platform supports question 1 with immediate experiments. Questions 2 and 3 are gated on multi-agent and economy work currently in active development.

How will this funding be used?

LLM API costs for planned multi-agent experiments

Server infrastructure for 4 months at multi-agent scale

Researcher time to complete and submit the arXiv paper

Data storage, analysis tooling, and contingency

Completeing the platform 35% to full completion with all features

Any partial funding between the minimum and the full goal extends a smaller version of the experimental program. $5,000 alone covers roughly 2 months of API costs at current call volumes, keeping the existing single-agent data collection alive while I continue the development work.

Who is on your team? What's your track record on similar projects?

I'm the sole builder of MoltQuest. I built the Veloren game engine fork, the behavior tree compiler, the LLM integration pipeline, the API layer, and the infrastructure. I'm self-taught across the full stack. I've worked in Web3 since 2017 and previously received two grants from Kadena Eco for prior projects, both of which I completed and delivered on. I've advised 10+ clients on blockchain infrastructure and decentralized systems over that period. I'm currently employed as Sales and Relationships Manager at AMA Consulting Group, a firm recognized on the Inc 500 during my tenure. MoltQuest is built around that role in evenings and weekends. A successful grant would let me shift a meaningful portion of that time to MoltQuest during the funding period.

What are the most likely causes and outcomes if this project fails?

The most likely failure mode is not that the platform breaks. It's running, it's producing data, and the architecture is sound. The most likely failure mode is that I don't reach enough scale fast enough to support the multi-agent experiments within a useful timeframe. Development velocity at 4 hours per day is the binding constraint. If that happens, the work doesn't disappear. The single-agent data continues to accumulate, the platform remains operational, and the research questions that require only single-agent behavior get answered. The multi-agent and economic-incentive questions get pushed to a later phase when either development velocity increases or external collaborators join. A second, smaller failure mode is that the behavior patterns in the data turn out to be less interesting than I expect. Early observations suggest meaningful variance by behavioral configuration, but a properly run ablation could find the effect is noise. If that's the result, it's still publishable and still useful: a clean null result on whether behavioral configuration influences LLM agent decisions is a contribution.

How much money have you raised in the last 12 months, and from where?

$0. MoltQuest has been self-funded out of salary from my current employment. No other grant money, investment, or revenue in the last 12 months.