Skip to content

Running Tournaments

This guide covers everything you need to know about running CodeClash tournaments: CLI options, configuration files, environment variables, and output structure.

CLI Reference

uv run python main.py <config_path> [options]

Arguments

Argument Description
config_path Path to the tournament YAML config file (required)

Options

Flag Short Description
--cleanup -c Clean up game environment after running
--push -p Push each agent's final codebase to a new GitHub repository
--output-dir PATH -o PATH Custom output directory (default: logs/<username>/)
--suffix TEXT -s TEXT Suffix to append to the output folder name
--keep-containers -k Keep Docker containers after games/agents finish (useful for debugging)

Examples

# Basic run
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml

# Keep containers for debugging
uv run python main.py configs/test/battlesnake.yaml -k

# Custom output directory with suffix
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml \
    -o ./my_experiments \
    -s experiment1

# Push final codebases to GitHub
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml -p

Configuration Anatomy

Tournament configs are YAML files with four main sections:

# 1. Tournament settings
tournament:
  rounds: 5                    # Number of edit+compete rounds
  transparent: false           # If true, agents can see opponent's code

# 2. Game/Arena settings
game:
  name: BattleSnake            # Arena name (must match registered arena)
  sims_per_round: 1000         # Number of game simulations per round
  args:                        # Arena-specific arguments
    width: 11
    height: 11
    browser: false

# 3. Player/Agent definitions
players:
- agent: mini                  # Agent type: "mini" or "dummy"
  name: claude-sonnet-4-5      # Display name (used in logs)
  config:
    agent: !include mini/default.yaml
    model:
      model_name: '@anthropic/claude-sonnet-4-5-20250929'
      model_kwargs:
        temperature: 0.2
        max_tokens: 4096

- agent: mini
  name: o3
  config:
    agent: !include mini/default.yaml
    model:
      model_name: '@openai/o3'

# 4. Prompts for agents
prompts:
  game_description: |
    You are a software developer competing in BattleSnake...

The !include Directive

CodeClash supports !include for reusing config fragments:

# In your tournament config
config:
  agent: !include mini/default.yaml    # Includes configs/mini/default.yaml
  model:
    model_name: '@anthropic/claude-sonnet-4-5-20250929'

This is especially useful for:

  • Sharing agent configurations across tournaments
  • Keeping model-specific settings in one place
  • Reducing config duplication

Include paths are relative to the configs/ directory.

Tournament Section

Field Type Description
rounds int Number of tournament rounds (edit + compete cycles)
transparent bool If true, agents can see opponent's code changes

Game Section

Field Type Description
name string Arena name (BattleSnake, CoreWar, Halite, etc.)
sims_per_round int Number of game simulations per round
args dict Arena-specific arguments

Arena-Specific Args

BattleSnake:

args:
  width: 11          # Board width
  height: 11         # Board height
  browser: false     # Open browser for visualization

CoreWar:

args:
  core_size: 8000    # Memory size
  max_cycles: 80000  # Maximum execution cycles

Players Section

Each player entry defines an AI agent:

Field Type Description
agent string Agent type: mini (MiniSWEAgent) or dummy
name string Display name for logs and results
config dict Agent-specific configuration
config.model dict LLM model settings
config.agent dict Agent behavior settings

Model Configuration

model:
  model_name: '@anthropic/claude-sonnet-4-5-20250929'
  model_kwargs:
    temperature: 0.2
    max_tokens: 4096

Model names use the @provider/model format from LiteLLM.

Prompts Section

The prompts section defines what agents see:

Field Description
game_description Main prompt describing the game and task
system (Optional) System prompt for the LLM

Prompts support template variables:

Variable Description
{{player_id}} Agent's identifier
{{round}} Current round number
{{rounds}} Total number of rounds
{{working_dir}} Path to agent's codebase

Environment Variables

Required

Variable Description
GITHUB_TOKEN GitHub token for cloning game starter repos

LLM Providers

Set API keys for the providers you're using:

Variable Provider
OPENAI_API_KEY OpenAI (GPT-4, o3, etc.)
ANTHROPIC_API_KEY Anthropic (Claude)
GOOGLE_API_KEY Google (Gemini)
GROQ_API_KEY Groq

Optional

Variable Description
PORTKEY_API_KEY Portkey for LLM request management
AWS_ACCESS_KEY_ID AWS credentials (for AWS Batch)
AWS_SECRET_ACCESS_KEY AWS credentials
AWS_DEFAULT_REGION AWS region (default: us-east-1)

Output Structure

Tournament logs are saved to logs/<username>/<tournament_folder>/:

logs/
└── <username>/
    └── PvpTournament.BattleSnake.r5.s1000.p2.claude-sonnet-4-5.o3.241210143022/
        ├── config.yaml              # Copy of tournament config
        ├── tournament_metadata.json # Tournament summary
        ├── round_1/
        │   ├── game_results.json    # Game outcomes for this round
        │   ├── claude-sonnet-4-5/
        │   │   ├── changes.json     # Code changes made by agent
        │   │   ├── trajectory.json  # Agent's action history
        │   │   └── codebase/        # Snapshot of agent's code
        │   └── o3/
        │       ├── changes.json
        │       ├── trajectory.json
        │       └── codebase/
        ├── round_2/
        │   └── ...
        └── round_5/
            └── ...

Folder Naming Convention

PvpTournament.<Game>.r<rounds>.s<sims>.p<players>.<player_names>.<timestamp>

Example: PvpTournament.BattleSnake.r5.s1000.p2.claude-sonnet-4-5.o3.241210143022

Key Output Files

File Contents
config.yaml Complete tournament configuration
tournament_metadata.json Overall results, win counts, final scores
round_N/game_results.json Per-round game outcomes
round_N/<agent>/changes.json Code diffs made by agent
round_N/<agent>/trajectory.json LLM conversation/action log

Quick Recipes

Reproduce a paper result

# BattleSnake: Claude Sonnet 4.5 vs o3 (15 rounds)
uv run python main.py configs/main/BattleSnake__claude-sonnet-4-5-20250929__o3__r15__s1000.yaml

Run all arenas for a matchup

for arena in BattleSnake CoreWar Halite RoboCode RobotRumble; do
    uv run python main.py "configs/main/${arena}__claude-sonnet-4-5-20250929__o3__r15__s1000.yaml"
done

Debug a failing agent

# Keep containers for inspection
uv run python main.py configs/test/battlesnake.yaml -k

# Then inspect the container
docker ps -a  # Find container ID
docker logs <container_id>
docker exec -it <container_id> /bin/bash

Quick A/B test with custom suffix

# Run variant A
uv run python main.py configs/my_config.yaml -s variantA

# Run variant B (modified config)
uv run python main.py configs/my_config_b.yaml -s variantB

Batch run with different models

#!/bin/bash
models=("claude-sonnet-4-5-20250929" "gpt-5" "gemini-2.5-pro")
for model in "${models[@]}"; do
    config="configs/main/BattleSnake__${model}__o3__r15__s1000.yaml"
    if [ -f "$config" ]; then
        uv run python main.py "$config"
    fi
done

Viewing Results

After tournaments complete:

# Start the viewer
uv run python scripts/run_viewer.py

# Or specify a log directory
uv run python scripts/run_viewer.py -d logs/<username>/PvpTournament.BattleSnake...

See 2000+ tournament results from the paper.

Next Steps