Processing 1M Chess Games in 15 Seconds with Rust

By Noble Pilot · March 31, 2026 · 1 min read

I train self-supervised models on chess game data. My Python pipeline using python-chess took 25 minutes to parse and tokenize 1M games from Lichess PGN dumps. I rewrote it in Rust. It now takes 15 seconds. This post covers the architecture, why Rust was the right choice, and what I learned. The problem Training a chess move predictor requires converting PGN (Portable Game Notation) files into tokenized sequences — arrays of integer IDs that a neural network can consume. A typical Lichess monthly dump has 5M+ games in a zstd-compressed PGN file. My Python pipeline had three bottlenecks: PGN parsing — python-chess parses SAN notation, validates moves on a board, handles edge cases. Correct, but slow. ~15 minutes for 1M games. Tokenization — converting validated UCI moves to token IDs, tracking piece types and turns. ~10 minutes. Memory — all games loaded into a Python list of dicts. 1M games = ~4GB RAM. The Rust rewrite The tool is called ailed-soulsteal (named after a Castlevania abili

Processing 1M Chess Games in 15 Seconds with Rust

Related Posts

Trending on ShareHub

Latest on ShareHub

Browse Topics

Around the Network