Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.