Debate training on LLMs as a reward-hacking mitigation | Manifund