Running Tournaments
This guide covers everything you need to know about running CodeClash tournaments: CLI options, configuration files, environment variables, and output structure.
CLI Reference
uv run python main.py <config_path> [options]
Arguments
| Argument | Description |
|---|---|
config_path |
Path to the tournament YAML config file (required) |
Options
| Flag | Short | Description |
|---|---|---|
--cleanup |
-c |
Clean up game environment after running |
--push |
-p |
Push each agent's final codebase to a new GitHub repository |
--output-dir PATH |
-o PATH |
Custom output directory (default: logs/<username>/) |
--suffix TEXT |
-s TEXT |
Suffix to append to the output folder name |
--keep-containers |
-k |
Keep Docker containers after games/agents finish (useful for debugging) |
Examples
# Basic run
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml
# Keep containers for debugging
uv run python main.py configs/test/battlesnake.yaml -k
# Custom output directory with suffix
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml \
-o ./my_experiments \
-s experiment1
# Push final codebases to GitHub
uv run python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml -p
Configuration Anatomy
Tournament configs are YAML files with four main sections:
# 1. Tournament settings
tournament:
rounds: 5 # Number of edit+compete rounds
transparent: false # If true, agents can see opponent's code
# 2. Game/Arena settings
game:
name: BattleSnake # Arena name (must match registered arena)
sims_per_round: 1000 # Number of game simulations per round
args: # Arena-specific arguments
width: 11
height: 11
browser: false
# 3. Player/Agent definitions
players:
- agent: mini # Agent type: "mini" or "dummy"
name: claude-sonnet-4-5 # Display name (used in logs)
config:
agent: !include mini/default.yaml
model:
model_name: '@anthropic/claude-sonnet-4-5-20250929'
model_kwargs:
temperature: 0.2
max_tokens: 4096
- agent: mini
name: o3
config:
agent: !include mini/default.yaml
model:
model_name: '@openai/o3'
# 4. Prompts for agents
prompts:
game_description: |
You are a software developer competing in BattleSnake...
The !include Directive
CodeClash supports !include for reusing config fragments:
# In your tournament config
config:
agent: !include mini/default.yaml # Includes configs/mini/default.yaml
model:
model_name: '@anthropic/claude-sonnet-4-5-20250929'
This is especially useful for:
- Sharing agent configurations across tournaments
- Keeping model-specific settings in one place
- Reducing config duplication
Include paths are relative to the configs/ directory.
Tournament Section
| Field | Type | Description |
|---|---|---|
rounds |
int | Number of tournament rounds (edit + compete cycles) |
transparent |
bool | If true, agents can see opponent's code changes |
Game Section
| Field | Type | Description |
|---|---|---|
name |
string | Arena name (BattleSnake, CoreWar, Halite, etc.) |
sims_per_round |
int | Number of game simulations per round |
args |
dict | Arena-specific arguments |
Arena-Specific Args
BattleSnake:
args:
width: 11 # Board width
height: 11 # Board height
browser: false # Open browser for visualization
CoreWar:
args:
core_size: 8000 # Memory size
max_cycles: 80000 # Maximum execution cycles
Players Section
Each player entry defines an AI agent:
| Field | Type | Description |
|---|---|---|
agent |
string | Agent type: mini (MiniSWEAgent) or dummy |
name |
string | Display name for logs and results |
config |
dict | Agent-specific configuration |
config.model |
dict | LLM model settings |
config.agent |
dict | Agent behavior settings |
Model Configuration
model:
model_name: '@anthropic/claude-sonnet-4-5-20250929'
model_kwargs:
temperature: 0.2
max_tokens: 4096
Model names use the @provider/model format from LiteLLM.
Prompts Section
The prompts section defines what agents see:
| Field | Description |
|---|---|
game_description |
Main prompt describing the game and task |
system |
(Optional) System prompt for the LLM |
Prompts support template variables:
| Variable | Description |
|---|---|
{{player_id}} |
Agent's identifier |
{{round}} |
Current round number |
{{rounds}} |
Total number of rounds |
{{working_dir}} |
Path to agent's codebase |
Environment Variables
Required
| Variable | Description |
|---|---|
GITHUB_TOKEN |
GitHub token for cloning game starter repos |
LLM Providers
Set API keys for the providers you're using:
| Variable | Provider |
|---|---|
OPENAI_API_KEY |
OpenAI (GPT-4, o3, etc.) |
ANTHROPIC_API_KEY |
Anthropic (Claude) |
GOOGLE_API_KEY |
Google (Gemini) |
GROQ_API_KEY |
Groq |
Optional
| Variable | Description |
|---|---|
PORTKEY_API_KEY |
Portkey for LLM request management |
AWS_ACCESS_KEY_ID |
AWS credentials (for AWS Batch) |
AWS_SECRET_ACCESS_KEY |
AWS credentials |
AWS_DEFAULT_REGION |
AWS region (default: us-east-1) |
Output Structure
Tournament logs are saved to logs/<username>/<tournament_folder>/:
logs/
└── <username>/
└── PvpTournament.BattleSnake.r5.s1000.p2.claude-sonnet-4-5.o3.241210143022/
├── config.yaml # Copy of tournament config
├── tournament_metadata.json # Tournament summary
├── round_1/
│ ├── game_results.json # Game outcomes for this round
│ ├── claude-sonnet-4-5/
│ │ ├── changes.json # Code changes made by agent
│ │ ├── trajectory.json # Agent's action history
│ │ └── codebase/ # Snapshot of agent's code
│ └── o3/
│ ├── changes.json
│ ├── trajectory.json
│ └── codebase/
├── round_2/
│ └── ...
└── round_5/
└── ...
Folder Naming Convention
PvpTournament.<Game>.r<rounds>.s<sims>.p<players>.<player_names>.<timestamp>
Example: PvpTournament.BattleSnake.r5.s1000.p2.claude-sonnet-4-5.o3.241210143022
Key Output Files
| File | Contents |
|---|---|
config.yaml |
Complete tournament configuration |
tournament_metadata.json |
Overall results, win counts, final scores |
round_N/game_results.json |
Per-round game outcomes |
round_N/<agent>/changes.json |
Code diffs made by agent |
round_N/<agent>/trajectory.json |
LLM conversation/action log |
Quick Recipes
Reproduce a paper result
# BattleSnake: Claude Sonnet 4.5 vs o3 (15 rounds)
uv run python main.py configs/main/BattleSnake__claude-sonnet-4-5-20250929__o3__r15__s1000.yaml
Run all arenas for a matchup
for arena in BattleSnake CoreWar Halite RoboCode RobotRumble; do
uv run python main.py "configs/main/${arena}__claude-sonnet-4-5-20250929__o3__r15__s1000.yaml"
done
Debug a failing agent
# Keep containers for inspection
uv run python main.py configs/test/battlesnake.yaml -k
# Then inspect the container
docker ps -a # Find container ID
docker logs <container_id>
docker exec -it <container_id> /bin/bash
Quick A/B test with custom suffix
# Run variant A
uv run python main.py configs/my_config.yaml -s variantA
# Run variant B (modified config)
uv run python main.py configs/my_config_b.yaml -s variantB
Batch run with different models
#!/bin/bash
models=("claude-sonnet-4-5-20250929" "gpt-5" "gemini-2.5-pro")
for model in "${models[@]}"; do
config="configs/main/BattleSnake__${model}__o3__r15__s1000.yaml"
if [ -f "$config" ]; then
uv run python main.py "$config"
fi
done
Viewing Results
After tournaments complete:
# Start the viewer
uv run python scripts/run_viewer.py
# Or specify a log directory
uv run python scripts/run_viewer.py -d logs/<username>/PvpTournament.BattleSnake...
See 2000+ tournament results from the paper.
Next Steps
- Codebase Tour - Understand the architecture
- API Reference - Detailed class documentation
- Quick Start - Back to basics