Arena-Hard-Auto: An automatic LLM benchmark.