Building a Chess Engine from Scratch
tl;dr, how to represent a chess game programmatically?
Chess Engine Architecture Overview
The first decision is how to represent the chess board. We have several options:
Option 1: 8x8 Array (Simple but Inefficient)
class SimpleBoard:
def __init__(self):
# 8x8 board with piece codes
self.board = [
['r', 'n', 'b', 'q', 'k', 'b', 'n', 'r'],
['p', 'p', 'p', 'p', 'p', 'p', 'p', 'p'],
['.', '.', '.', '.', '.', '.', '.', '.'],
['.', '.', '.', '.', '.', '.', '.', '.'],
['.', '.', '.', '.', '.', '.', '.', '.'],
['.', '.', '.', '.', '.', '.', '.', '.'],
['P', 'P', 'P', 'P', 'P', 'P', 'P', 'P'],
['R', 'N', 'B', 'Q', 'K', 'B', 'N', 'R']
]
def get_piece(self, row, col):
return self.board[row][col]
def set_piece(self, row, col, piece):
self.board[row][col] = pieceOption 2: Bitboards (More Computationally Efficient, let’s go with this)
class BitBoard:
def __init__(self):
# Each piece type gets its own 64-bit integer
# We are basically just representing EVERYTHING in chess in 0s and 1s
self.white_pawns = 0x000000000000FF00
self.white_knights = 0x0000000000000042
self.white_bishops = 0x0000000000000024
self.white_rooks = 0x0000000000000081
self.white_queens = 0x0000000000000008
self.white_king = 0x0000000000000010
self.black_pawns = 0x00FF000000000000
self.black_knights = 0x4200000000000000
# ... and so on
But how do we represent moves? Positions? Attacks?
Bitmasks. Lots of them. Lets go more into depth about bitboards
Understanding Bitboard Mapping
A bitboard uses a 64-bit integer where each bit represents one square on the chess board.
Chess Board Layout:
a b c d e f g h
8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜ 8
7 ♟ ♟ ♟ ♟ ♟ ♟ ♟ ♟ 7
6 . . . . . . . . 6
5 . . . . . . . . 5
4 . . . . . . . . 4
3 . . . . . . . . 3
2 ♙ ♙ ♙ ♙ ♙ ♙ ♙ ♙ 2
1 ♖ ♘ ♗ ♕ ♔ ♗ ♘ ♖ 1
a b c d e f g h
Bitboard Square Numbers (0-63):
0 1 2 3 4 5 6 7 <- a8-h8 (rank 8)
8 9 10 11 12 13 14 15 <- a7-h7 (rank 7)
16 17 18 19 20 21 22 23 <- a6-h6 (rank 6)
24 25 26 27 28 29 30 31 <- a5-h5 (rank 5)
32 33 34 35 36 37 38 39 <- a4-h4 (rank 4)
40 41 42 43 44 45 46 47 <- a3-h3 (rank 3)
48 49 50 51 52 53 54 55 <- a2-h2 (rank 2)
56 57 58 59 60 61 62 63 <- a1-h1 (rank 1)
Note: Each bit position represents 2^position. For example:
1000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
represents 2^63 + 2^56 (bits 63 and 56 are flipped/set to 1)
Binary Representation Example
Let’s look at the initial white pawns position:
# White pawns start on rank 2 (squares 48-55)
white_pawns = 0x000000000000FF00 =
# 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 <-- see where the 1's are
# In binary (64 bits):
# Bit position: 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 ... 7 6 5 4 3 2 1 0
# Binary: 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 0 0
# Square: a1 b1 c1 d1 e1 f1 g1 h1 a2 b2 c2 d2 e2 f2 g2 h2 ...
# Visualized on 8x8 board (1 = pawn present, 0 = empty):
# a b c d e f g h
# 8 0 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0
# 2 1 1 1 1 1 1 1 1 <- White pawns here
# 1 0 0 0 0 0 0 0 0
#So for BLACK bishops, it would look something like this:
black_bishops = 0x2400000000000000 =
# = 0000 0010 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <-- notice the 1's in the 3rd and 6th positions
# a b c d e f g h
# 8 0 0 1 0 0 1 0 0 <- Black bishops on c8 and f8
# 7 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0
# 2 0 0 0 0 0 0 0 0
# 1 0 0 0 0 0 0 0 0Step 2: Move Generation
Next, we need to generate all legal moves for each piece. That way, when a user moves a piece, it just has to fall into the set of legal moves.
For this part, the following is helpful to know:
We use bitwise operations such as BITWISE OR/AND to get what we need. Ex: We can get all white positions by BITWISE ORing all the white pieces together.
all_white_positions = white_pawns | white_knights | white_bishops | white_rooks | white_queens | white_king So imagine we have white_pawns = 0x000000000000FF00 and white_knights = 0x0000000000000042 ….. We would get a result of 0x000000000000FF42 … as well as all the other white pieces together.
We could also do all_positions = white_positions | black_positions to get all positions on the board. You can kind of see how this is super helpful now.
For each file, we can generate masks that can help us filter moves. Ex: If we want File A or file H, we can represent that as file_A = 0x8080808080808080 and file_H= 0x0101010101010101 , same goes for other diagonals, rows, and columns.
For special cases such as en passant, and castling, we could have a bitmask like castling_rights = 0b1111 representing the 4 possible castling moves (black or white kingside or queenside etc), and flip them when we do castle.
# For reference this is kind of what they look like
A_FILE = 0x0101010101010101 # a1, a2, a3, ..., a8
# A_FILE = 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001
H_FILE = 0x8080808080808080 # h1, h2, h3, ..., h8
# H_FILE = 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000
# Rank masks (horizontal rows)
RANK_1 = 0x00000000000000FF # a1, b1, c1, ..., h1
RANK_8 = 0xFF00000000000000 # a8, b8, c8, ..., h8
# Diagonal masks
MAIN_DIAGONAL = 0x8040201008040201 # a1-h8 diagonal
ANTI_DIAGONAL = 0x0102040810204080 # h1-a8 diagonalNow for the actual move generation.
We can do different things for different pieces.
For pawns, we can just generate all the possible moves for each pawn. For knights, we can just generate all the possible moves for each knight. etc etc.
This would look something like the following:
class MoveGenerator:
def __init__(self):
# Pre-computed attack masks for different pieces
self.knight_attacks = self.generate_knight_attack_table()
self.king_attacks = self.generate_king_attack_table()
# File and rank masks
self.file_masks = self.generate_file_masks()
self.rank_masks = self.generate_rank_masks()
def generate_pawn_moves(self, board, color):
"""Generate all pawn moves using bitboard operations"""
if color == 'white':
pawns = board.white_pawns
enemies = board.get_black_pieces()
all_pieces = board.get_all_pieces()
# Single pawn pushes
# pawns << 8 means "shift all pawn bits 8 positions to the left"
# This moves each pawn forward by one rank (8 squares = 1 rank)
single_pushes = (pawns << 8) & ~all_pieces
# Double pawn pushes (only from rank 2)
double_pushes = ((single_pushes & 0x0000000000FF0000) << 8) & ~all_pieces
# Pawn captures (diagonal attacks)
left_attacks = ((pawns & ~0x8080808080808080) << 7) & enemies # Not on H-file
right_attacks = ((pawns & ~0x0101010101010101) << 9) & enemies # Not on A-file
return single_pushes | double_pushes | left_attacks | right_attacks
else: # black pawns
pawns = board.black_pawns
enemies = board.get_white_pieces()
all_pieces = board.get_all_pieces()
# Single pawn pushes (black moves "down" the board)
single_pushes = (pawns >> 8) & ~all_pieces
# Double pawn pushes (only from rank 7)
# Same as above but the AND is flipped
double_pushes = ((single_pushes & 0x0000FF0000000000) >> 8) & ~all_pieces
# Pawn captures
left_attacks = ((pawns & ~0x0101010101010101) >> 7) & enemies # Not on A-file
right_attacks = ((pawns & ~0x8080808080808080) >> 9) & enemies # Not on H-file
return single_pushes | double_pushes | left_attacks | right_attacksWhat about sliding pieces? Rooks? Bishops? Queens? These pieces are difficult to represent since pieces in the way of their sliding attack effectively blocks a lot of their paths. Much harder to compute.
This is where we bring in the “magic” bitboards. Josh Watzman’s article explains this in detail.
Basically, we just construct a giant hashtable for all the moves made by those pieces. Pretty magical.
For Kings and Knights, their moves are fixed. Much easier to compute. No need for magic bitboards with them.
Won’t go over details on how to implement checkmate, stalement, and the rest of it (will leave sources down below if you wish to do so) but after you finish building your chess board representation you can basically run a perft test. Given these starting positions, you generate all legal moves going into X depth in and your #’s should match with the following results from the source.
Step 3 : RL application in Chess
Okay I am skipping over a bunch of the bitboard implementation but the top part should be sufficient enough to get an idea of what to do.
How do we build out the chess engine to think now?
Our first option looks like the following: Source: stack_overflow
def minimax(pos, depth, alpha, beta, maximizingPlayer) :
# maximizingPlayer = True if White2Play and False is Black2Play
if depth == 0 or game_is_finished(board) or can_t_move(board) :
return evaluate(board) # <- zero ground for recusivity
if maximizingPlayer :
maxEval = -infinity(or a number smaller than your smallest evaluation)
for child in child_of_position(board) :
evalu = minimax(child, depth - 1, alpha, beta, False) # recursivity
maxEval = max(maxEval, evalu)
alpha = max(alpha, evalu)
if beta <= alpha :
break
return maxEval
else :
minEval = infinity(or a number larger than your largest evaluation)
for child in child_of_position(board) :
evalu = minimax(child, depth - 1, alpha, beta, True) # recursivity
minEval = min(minEval, evalu)
alpha = min(alpha, evalu)
if beta <= alpha :
break
return minEvalBasically, it’s just recursive DFS where we are looking into the future of whether or not a state would result in a loss or win. Of course, this becomes problematic when we try to calculate all of the 10^120 possible game states. We can prune branches that are obviously bad states such as ones where we would hang the queen.
Moving onto other implementations, what about AlphaZero? How do we implement that?
As a reference, this guide is absolutely golden to understand everything starting from minimax. It then goes on to explain prequisites to understanding AlphaZero, starting from Multi-armed bandits + Upper Confidence bound (UCB) -> Policy and Value Functions -> Monte Carlo Search Tree (MCTS) + Upper confidence bound for Trees (UCT).
Building off a lot of what was said in the above guide, here is basically what we need to build it: - MCTS - The policy/value approximation network - 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks
We aren’t trying to make the next Magnus Carlsen so lets just try our best with what we have:
A general overview of MCTS would have the following steps: Selection: Select child node from the current node based on the tree policy. Expansion: Expand the child node based on the exploration / exploitation trade-off. Simulation: Simulate from the child node until termination or upon reaching a suitably small future reward (like from reward decay). Backup: Backup the reward along the path taken according to the tree policy.
How does this work for us? Let’s denote a node as a game state: so something like r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1 (FEN representation)
In traditional MCTS, we use the UCB formula denoted as: argmax(Q(s,a) + C * sqrt(ln(N(s)) / N(s,a))), where Q(s,a) = average reward/value, C = exploration constant, N(s) = total number of times parent node s has been visited, and N(s,a) = number of times action a has been taken from state s.
This balances exploitation: Q(s,a) - favors actions that have historically performed well And exploration: C * sqrt(ln(N(s)) / N(s,a)) - favors actions that have been tried less frequently
In the variation that alphazero introduces, we now have PUCT (Predictor + UCT) argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))))
Now where the hell did P(s,a) come from?
This is where part 2/ the policy and value approximation network comes into play
We need a network that will take in an input of a game state, and output the probability distributions of next possible actions + estimated value of current state. Your implementation may look like the following:
class SimplePolicyValueNetwork(nn.Module):
"""
A very simple policy-value network for demonstration purposes.
This network takes a game state as input and outputs:
1. Policy: probability distribution over possible actions
2. Value: estimated value of the current state
"""
def __init__(self, input_size, action_size, hidden_size=128):
super(SimplePolicyValueNetwork, self).__init__()
self.shared_fc1 = nn.Linear(input_size, hidden_size)
self.shared_fc2 = nn.Linear(hidden_size, hidden_size)
self.policy_head = nn.Linear(hidden_size, action_size)
self.value_head = nn.Linear(hidden_size, 1)
def forward(self, x):
x = F.relu(self.shared_fc1(x))
shared_features = F.relu(self.shared_fc2(x))
# Policy output (probabilities over actions)
policy_logits = self.policy_head(shared_features)
policy = F.softmax(policy_logits, dim=-1)
# Value output (single scalar between -1 and 1)
value = torch.tanh(self.value_head(shared_features))
return policy, valueAnd so we have an output of dimensione [1,action_size] for the policy and value dimensions = [1, 1], representing good/bad
When MCTS visits new state -> run the approximation network We now have probability distribution for each possible action + how good they are. We plug that into our formula above, and now we tell the PUCT formula that these new moves look kinda good
And that’s basically it. As for the results you can look at my project here and my wandb runs here for whatever For whatever reason the runs keep crashing around the 10 hour mark, but I guess that is my divine punishment for trying to run it on a regular M3 w/ 8 GB of RAM. Will pick this back up when I gain a bazillion of GPU credits to see the performance + bug fix.
Sources and References
Chess Engine Implementation
- Magic Bitboards Explained - Josh Watzman’s detailed article
- Perft Test Results - Move generation verification
- Minimax Implementation - Stack Overflow discussion
AlphaZero and Reinforcement Learning
- AlphaGoZero Guide - Depth First Learning comprehensive tutorial
- PUCT Variation - Computer-Go Archive discussion
Project Implementation
- My Chess Engine Project - GitHub repository
- Training Runs and Results - Weights & Biases dashboard
Happy coding! Feel free to reach out to me to discuss further!