Codebase Tour
This guide helps contributors understand the CodeClash architecture. Whether you're adding a new arena, implementing an agent, or fixing a bug, this tour will orient you to the key modules and extension points.
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ main.py │
│ (CLI entry point) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ tournaments/pvp.py │
│ (Tournament orchestration) │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ arenas/ │ │ agents/ │ │ utils/ │
│ (Game logic) │ │ (LLM agents) │ │ (Docker, Git) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Directory Structure
codeclash/
├── __init__.py # Package init, defines PACKAGE_DIR, CONFIG_DIR
├── constants.py # Global constants (LOG_DIR, etc.)
│
├── agents/ # AI agent implementations
│ ├── __init__.py # Agent registry and get_agent()
│ ├── player.py # Abstract Player base class
│ ├── minisweagent.py # MiniSWEAgent implementation
│ └── dummy_agent.py # Dummy agent for testing
│
├── arenas/ # Game arena implementations
│ ├── __init__.py # Arena registry and get_game()
│ ├── arena.py # Abstract CodeArena base class
│ ├── battlesnake/ # BattleSnake arena
│ ├── battlecode/ # BattleCode arena
│ ├── corewar/ # CoreWar arena
│ ├── halite/ # Halite arena
│ ├── robocode/ # RoboCode arena
│ ├── robotrumble/ # RobotRumble arena
│ ├── huskybench/ # HuskyBench arena
│ └── dummy/ # Dummy arena for testing
│
├── tournaments/ # Tournament orchestration
│ ├── __init__.py
│ ├── tournament.py # Abstract tournament base class
│ ├── pvp.py # PvP tournament implementation
│ └── single_player.py # Single-player tournament
│
├── analysis/ # Post-tournament analysis
│ ├── metrics/ # ELO, TrueSkill calculations
│ ├── viz/ # Visualization generation
│ ├── llm_as_judge/ # LLM-based code analysis
│ └── bootstrap/ # Statistical analysis
│
├── viewer/ # Web-based result viewer
│ ├── app.py # Flask application
│ └── templates/ # Jinja2 templates
│
└── utils/ # Shared utilities
├── environment.py # Docker container management
├── git_utils.py # Git operations
├── yaml_utils.py # Config parsing with !include
├── aws.py # AWS Batch integration
└── log.py # Custom logging
Key Modules
Entry Point: main.py
The CLI entry point. Parses arguments, loads config, and starts a tournament:
# main.py (simplified)
def main(config_path, cleanup=False, push=False, ...):
config = yaml.safe_load(config_path.read_text())
tournament = PvpTournament(config, output_dir=..., cleanup=cleanup)
tournament.run()
When to modify: Adding new CLI flags or changing tournament initialization.
Tournaments: codeclash/tournaments/
Orchestrates the multi-round edit+compete loop.
# tournaments/pvp.py (simplified)
class PvpTournament(AbstractTournament):
def run(self):
for round_num in range(self.rounds):
# Edit phase: agents modify their code
for agent in self.agents:
agent.run(round_num)
# Compete phase: run game simulations
results = self.arena.run_round(round_num)
self.record_results(results)
Key classes:
AbstractTournament- Base class with common logicPvpTournament- Multi-player competitive tournamentsSinglePlayerTournament- Single agent improvement tracking
When to modify: Adding new tournament types or changing the round loop.
Arenas: codeclash/arenas/
Game implementations that run competitions between agent codebases.
# arenas/arena.py (simplified)
class CodeArena(ABC):
name: str # e.g., "BattleSnake"
def run_round(self, agents, round_num: int) -> RoundStats:
"""Execute one round of the game (concrete method)."""
# Calls execute_round() and get_results() internally
pass
@abstractmethod
def execute_round(self, agents) -> None:
"""Game-specific execution logic."""
pass
@abstractmethod
def validate_code(self, agent) -> tuple[bool, str | None]:
"""Check if agent's code compiles/runs."""
pass
@abstractmethod
def get_results(self, agents, round_num: int, stats: RoundStats):
"""Determine winner based on game output."""
pass
Arena registry (arenas/__init__.py):
ARENAS = [BattleCodeArena, BattleSnakeArena, CoreWarArena, HaliteArena,
HuskyBenchArena, RoboCodeArena, RobotRumbleArena, ...]
def get_game(config: dict, **kwargs) -> CodeArena:
game = {x.name: x for x in ARENAS}.get(config["game"]["name"])
return game(config, **kwargs)
When to modify: Adding new games or changing game mechanics.
Agents: codeclash/agents/
AI agents that modify code during the edit phase.
# agents/player.py (simplified)
class Player(ABC):
@abstractmethod
def run(self) -> None:
"""Execute agent's code improvement strategy."""
pass
def pre_run_hook(self):
"""Setup before round execution."""
pass
def post_run_hook(self):
"""Cleanup and metadata after round."""
pass
Agent registry (agents/__init__.py):
def get_agent(config, game_context, environment, push=False) -> Player:
agents = {"dummy": Dummy, "mini": MiniSWEAgent}
return agents[config["agent"]](config, environment, game_context, push)
When to modify: Adding new agent types or changing agent behavior.
Utilities: codeclash/utils/
Shared helper functions:
| Module | Purpose |
|---|---|
environment.py |
Docker container lifecycle management |
git_utils.py |
Git clone, commit, push operations |
yaml_utils.py |
Config parsing with !include directive |
aws.py |
AWS Batch and ECR integration |
log.py |
Custom logging with emoji prefixes |
Extension Points
Adding a New Arena
-
Create arena directory:
codeclash/arenas/myarena/ ├── __init__.py ├── myarena.py └── Dockerfile # (if needed) -
Implement the arena class:
# codeclash/arenas/myarena/myarena.py from codeclash.arenas.arena import CodeArena class MyArena(CodeArena): name = "MyArena" def execute_round(self, agents) -> None: # Game-specific execution logic pass def validate_code(self, agent) -> tuple[bool, str | None]: # Check if code compiles/runs pass def get_results(self, agents, round_num, stats): # Determine winner based on game output pass -
Register the arena:
# codeclash/arenas/__init__.py from codeclash.arenas.myarena.myarena import MyArena ARENAS = [..., MyArena] -
Add documentation:
- Create
docs/reference/arenas/myarena.md -
Add to
mkdocs.ymlnavigation -
Create example configs:
configs/test/myarena.yaml(dummy agents)configs/examples/MyArena__model1__model2__r5__s100.yaml
Adding a New Agent Type
-
Create agent file:
# codeclash/agents/myagent.py from codeclash.agents.player import Player class MyAgent(Player): def run(self) -> None: # Your code improvement logic pass -
Register the agent:
# codeclash/agents/__init__.py from codeclash.agents.myagent import MyAgent def get_agent(...): agents = {"dummy": Dummy, "mini": MiniSWEAgent, "my": MyAgent} ... -
Add documentation:
- Create
docs/reference/player/myagent.md
Adding Analysis Tools
Analysis modules live in codeclash/analysis/. Common patterns:
# codeclash/analysis/mymetric/compute.py
def compute_metric(tournament_dir: Path) -> dict:
"""Compute custom metric from tournament results."""
results = json.loads((tournament_dir / "tournament_metadata.json").read_text())
# Your analysis logic
return {"metric": value}
First Steps for Contributors
1. Run the tests
uv run pytest
2. Try a test tournament
uv run python main.py configs/test/battlesnake.yaml
3. Explore an existing arena
Read through codeclash/arenas/battlesnake/battlesnake.py to understand:
- How Docker images are built
- How games are executed
- How results are parsed
4. Trace a tournament run
Add some print statements or use a debugger:
uv run python -m pdb main.py configs/test/battlesnake.yaml
5. Check the viewer
uv run python scripts/run_viewer.py -d logs/*/PvpTournament.*
Code Style
- Formatting: Ruff (Black-compatible)
- Linting: Ruff
- Type hints: Encouraged but not required everywhere
- Docstrings: Google style
# Format and lint
uv run ruff format .
uv run ruff check . --fix
Common Tasks
| Task | Location |
|---|---|
| Add CLI flag | main.py:main_cli() |
| Change tournament flow | tournaments/pvp.py |
| Add new game | arenas/<game>/<game>.py |
| Add new agent | agents/<agent>.py |
| Add new metric | analysis/metrics/ |
| Change Docker behavior | utils/environment.py |
| Modify config parsing | utils/yaml_utils.py |
Getting Help
- Issues: GitHub Issues
- Discussions: Open an issue to discuss ideas
- Contact: John Yang (johnby@stanford.edu), Kilian Lieret (kl5675@princeton.edu)