Case study
2025
AI Chess Engine
An AlphaZero-inspired chess engine with a ResNet evaluator and Monte Carlo Tree Search, served as an ONNX REST API.
Role
Sole engineer — training, export, serving
Status
Released
Services
ML model development & deployment
The shape of the project
AlphaZero is the reference point for modern game-playing AI: a deep network that evaluates board states and predicts move probabilities, paired with Monte Carlo Tree Search that uses the network to focus its search. It works beautifully and it’s punishingly expensive to train from scratch.
This project took the AlphaZero recipe and made it tractable on a single workstation. The evaluator network is a ResNet of modest depth. The pre-training pass uses a corpus of Lichess games to give the network a strong prior before any self-play happens. The self-play refinement that follows is shorter than a from-scratch run but produces a real improvement on top of the pre-trained base.
The final artefact is exported to ONNX, served by a thin FastAPI process, and runs on CPU. The whole thing fits on a $5 VPS.
The decisions
Pre-training before self-play. A pure self-play loop starting from a randomly initialised network is what the AlphaZero papers describe. It’s also a multi-week run on serious hardware. Seeding the network with a Lichess-trained prior cut the self-play time to something a single machine could finish, and the resulting engine plays meaningfully stronger than the pre-trained network alone.
ONNX export for the inference path. Training stayed in PyTorch — that’s where the iteration is fast. Serving moved to ONNX runtime because the inference footprint is dramatically smaller and the runtime ships everywhere. The export step is part of the build script, so the model the API serves is always the model the training run produced, with no PyTorch dependency at runtime.
CPU serving, not GPU. Running the inference on CPU is slower per evaluation, but it lets the whole stack — model and API and MCTS — fit on commodity hardware. For a chess engine that doesn’t need to play tournament-speed games, this is the right trade.
What MCTS does in the loop
The neural network gives a policy distribution (likely moves) and a value estimate (likely outcome) for any board state. MCTS uses those signals to focus its search: it expands the most promising branches first, backpropagates results up the tree, and eventually returns a move along with a search-derived confidence. The interesting tuning isn’t in the network — it’s in the MCTS parameters: the exploration constant, the temperature schedule, the simulation budget per move.
Why this lives on the Dynamic Reasoning site
This is the working example of the model-development-to-deployment work I do. The decisions that mattered were almost all about practicality: choosing a network small enough to train and serve, finding a training shortcut that didn’t compromise the result, picking an inference path that deploys anywhere. If you have a model you’ve prototyped in a notebook and you need it to be a service that runs in production, this is roughly the shape of that engagement.
Next
Working on something with this shape?
Book a call