MoltQuest: empirical testbed for LLM agent behavior and human oversight | Manifund