Building a Chess Engine from Scratch

chess

algorithms

python

Author

Daniel Ung

Published

September 3, 2025

tl;dr, how to represent a chess game programmatically?

Chess Engine Architecture Overview

The first decision is how to represent the chess board. We have several options:

Option 1: 8x8 Array (Simple but Inefficient)

class SimpleBoard:
    def __init__(self):
        # 8x8 board with piece codes
        self.board = [
            ['r', 'n', 'b', 'q', 'k', 'b', 'n', 'r'],
            ['p', 'p', 'p', 'p', 'p', 'p', 'p', 'p'],
            ['.', '.', '.', '.', '.', '.', '.', '.'],
            ['.', '.', '.', '.', '.', '.', '.', '.'],
            ['.', '.', '.', '.', '.', '.', '.', '.'],
            ['.', '.', '.', '.', '.', '.', '.', '.'],
            ['P', 'P', 'P', 'P', 'P', 'P', 'P', 'P'],
            ['R', 'N', 'B', 'Q', 'K', 'B', 'N', 'R']
        ]
    
    def get_piece(self, row, col):
        return self.board[row][col]
    
    def set_piece(self, row, col, piece):
        self.board[row][col] = piece

Option 2: Bitboards (More Computationally Efficient, let’s go with this)

class BitBoard:
    def __init__(self):
        # Each piece type gets its own 64-bit integer
        # We are basically just representing EVERYTHING in chess in 0s and 1s
        self.white_pawns = 0x000000000000FF00
        self.white_knights = 0x0000000000000042
        self.white_bishops = 0x0000000000000024
        self.white_rooks = 0x0000000000000081
        self.white_queens = 0x0000000000000008
        self.white_king = 0x0000000000000010
        
        self.black_pawns = 0x00FF000000000000
        self.black_knights = 0x4200000000000000
        # ... and so on

But how do we represent moves? Positions? Attacks?

Bitmasks. Lots of them. Lets go more into depth about bitboards

Understanding Bitboard Mapping

A bitboard uses a 64-bit integer where each bit represents one square on the chess board.

Chess Board Layout:
  a b c d e f g h
8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜  8
7 ♟ ♟ ♟ ♟ ♟ ♟ ♟ ♟  7
6 . . . . . . . .  6
5 . . . . . . . .  5
4 . . . . . . . .  4
3 . . . . . . . .  3
2 ♙ ♙ ♙ ♙ ♙ ♙ ♙ ♙  2
1 ♖ ♘ ♗ ♕ ♔ ♗ ♘ ♖  1
  a b c d e f g h

Bitboard Square Numbers (0-63):
   0  1  2  3  4  5  6  7   <- a8-h8 (rank 8)
   8  9 10 11 12 13 14 15   <- a7-h7 (rank 7)
  16 17 18 19 20 21 22 23   <- a6-h6 (rank 6)
  24 25 26 27 28 29 30 31   <- a5-h5 (rank 5)
  32 33 34 35 36 37 38 39   <- a4-h4 (rank 4)
  40 41 42 43 44 45 46 47   <- a3-h3 (rank 3)
  48 49 50 51 52 53 54 55   <- a2-h2 (rank 2)
  56 57 58 59 60 61 62 63   <- a1-h1 (rank 1)
  
 Note: Each bit position represents 2^position. For example:
 1000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
 represents 2^63 + 2^56 (bits 63 and 56 are flipped/set to 1)

Binary Representation Example

Let’s look at the initial white pawns position:

# White pawns start on rank 2 (squares 48-55)
white_pawns = 0x000000000000FF00 = 
# 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 <-- see where the 1's are 


# In binary (64 bits): 
# Bit position: 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 ... 7 6 5 4 3 2 1 0
# Binary:        0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1 ... 0 0 0 0 0 0 0 0
# Square:      a1 b1 c1 d1 e1 f1 g1 h1 a2 b2 c2 d2 e2 f2 g2 h2 ...

# Visualized on 8x8 board (1 = pawn present, 0 = empty):
#   a b c d e f g h
# 8 0 0 0 0 0 0 0 0
# 7 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0
# 2 1 1 1 1 1 1 1 1  <- White pawns here
# 1 0 0 0 0 0 0 0 0

#So for BLACK bishops, it would look something like this: 

black_bishops = 0x2400000000000000 = 
# = 0000 0010 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <-- notice the 1's in the 3rd and 6th positions

#   a b c d e f g h
# 8 0 0 1 0 0 1 0 0  <- Black bishops on c8 and f8
# 7 0 0 0 0 0 0 0 0
# 6 0 0 0 0 0 0 0 0
# 5 0 0 0 0 0 0 0 0
# 4 0 0 0 0 0 0 0 0
# 3 0 0 0 0 0 0 0 0
# 2 0 0 0 0 0 0 0 0
# 1 0 0 0 0 0 0 0 0

Step 2: Move Generation

Next, we need to generate all legal moves for each piece. That way, when a user moves a piece, it just has to fall into the set of legal moves.

For this part, the following is helpful to know:

We use bitwise operations such as BITWISE OR/AND to get what we need. Ex: We can get all white positions by BITWISE ORing all the white pieces together.

all_white_positions = white_pawns | white_knights | white_bishops | white_rooks | white_queens | white_king So imagine we have white_pawns = 0x000000000000FF00 and white_knights = 0x0000000000000042 ….. We would get a result of 0x000000000000FF42 … as well as all the other white pieces together.

We could also do all_positions = white_positions | black_positions to get all positions on the board. You can kind of see how this is super helpful now.

For each file, we can generate masks that can help us filter moves. Ex: If we want File A or file H, we can represent that as file_A = 0x8080808080808080 and file_H= 0x0101010101010101 , same goes for other diagonals, rows, and columns.

For special cases such as en passant, and castling, we could have a bitmask like castling_rights = 0b1111 representing the 4 possible castling moves (black or white kingside or queenside etc), and flip them when we do castle.

# For reference this is kind of what they look like 
A_FILE = 0x0101010101010101  # a1, a2, a3, ..., a8
# A_FILE = 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001
H_FILE = 0x8080808080808080  # h1, h2, h3, ..., h8
# H_FILE = 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000

# Rank masks (horizontal rows)  
RANK_1 = 0x00000000000000FF  # a1, b1, c1, ..., h1
RANK_8 = 0xFF00000000000000  # a8, b8, c8, ..., h8

# Diagonal masks
MAIN_DIAGONAL = 0x8040201008040201  # a1-h8 diagonal
ANTI_DIAGONAL = 0x0102040810204080  # h1-a8 diagonal

Now for the actual move generation.

We can do different things for different pieces.

For pawns, we can just generate all the possible moves for each pawn. For knights, we can just generate all the possible moves for each knight. etc etc.

This would look something like the following:

class MoveGenerator:
    def __init__(self):
        # Pre-computed attack masks for different pieces
        self.knight_attacks = self.generate_knight_attack_table()
        self.king_attacks = self.generate_king_attack_table()
        
        # File and rank masks
        self.file_masks = self.generate_file_masks()
        self.rank_masks = self.generate_rank_masks()
        
    def generate_pawn_moves(self, board, color):
        """Generate all pawn moves using bitboard operations"""
        if color == 'white':
            pawns = board.white_pawns
            enemies = board.get_black_pieces()
            all_pieces = board.get_all_pieces()
            
            # Single pawn pushes
            # pawns << 8 means "shift all pawn bits 8 positions to the left"
            # This moves each pawn forward by one rank (8 squares = 1 rank)
            single_pushes = (pawns << 8) & ~all_pieces
            
            # Double pawn pushes (only from rank 2)
            double_pushes = ((single_pushes & 0x0000000000FF0000) << 8) & ~all_pieces
            
            # Pawn captures (diagonal attacks)
            left_attacks = ((pawns & ~0x8080808080808080) << 7) & enemies  # Not on H-file
            right_attacks = ((pawns & ~0x0101010101010101) << 9) & enemies  # Not on A-file
            
            return single_pushes | double_pushes | left_attacks | right_attacks
            
        else:  # black pawns
            pawns = board.black_pawns
            enemies = board.get_white_pieces()
            all_pieces = board.get_all_pieces()
            
            # Single pawn pushes (black moves "down" the board)
            single_pushes = (pawns >> 8) & ~all_pieces
            
            # Double pawn pushes (only from rank 7)
            # Same as above but the AND is flipped 
            double_pushes = ((single_pushes & 0x0000FF0000000000) >> 8) & ~all_pieces
            
            # Pawn captures
            left_attacks = ((pawns & ~0x0101010101010101) >> 7) & enemies  # Not on A-file
            right_attacks = ((pawns & ~0x8080808080808080) >> 9) & enemies  # Not on H-file
            
            return single_pushes | double_pushes | left_attacks | right_attacks

What about sliding pieces? Rooks? Bishops? Queens? These pieces are difficult to represent since pieces in the way of their sliding attack effectively blocks a lot of their paths. Much harder to compute.

This is where we bring in the “magic” bitboards. Josh Watzman’s article explains this in detail.

Basically, we just construct a giant hashtable for all the moves made by those pieces. Pretty magical.

For Kings and Knights, their moves are fixed. Much easier to compute. No need for magic bitboards with them.

Won’t go over details on how to implement checkmate, stalement, and the rest of it (will leave sources down below if you wish to do so) but after you finish building your chess board representation you can basically run a perft test. Given these starting positions, you generate all legal moves going into X depth in and your #’s should match with the following results from the source.

Step 3 : RL application in Chess

Okay I am skipping over a bunch of the bitboard implementation but the top part should be sufficient enough to get an idea of what to do.

How do we build out the chess engine to think now?

Our first option looks like the following: Source: stack_overflow


def minimax(pos, depth, alpha, beta, maximizingPlayer) :
# maximizingPlayer = True if White2Play and False is Black2Play

if depth == 0 or game_is_finished(board) or can_t_move(board) :
    return evaluate(board)                                     # <- zero ground for recusivity

if maximizingPlayer :
    maxEval = -infinity(or a number smaller than your smallest evaluation)
    for child in child_of_position(board) :
        evalu = minimax(child, depth - 1, alpha, beta, False)  # recursivity
        maxEval = max(maxEval, evalu)
        alpha = max(alpha, evalu)
        if beta <= alpha :
            break
    return maxEval

else :
    minEval = infinity(or a number larger than your largest evaluation)
    for child in child_of_position(board) :
        evalu = minimax(child, depth - 1, alpha, beta, True)    # recursivity
        minEval = min(minEval, evalu)
        alpha = min(alpha, evalu)
        if beta <= alpha :
            break
    return minEval

Basically, it’s just recursive DFS where we are looking into the future of whether or not a state would result in a loss or win. Of course, this becomes problematic when we try to calculate all of the 10^120 possible game states. We can prune branches that are obviously bad states such as ones where we would hang the queen.

Moving onto other implementations, what about AlphaZero? How do we implement that?

As a reference, this guide is absolutely golden to understand everything starting from minimax. It then goes on to explain prequisites to understanding AlphaZero, starting from Multi-armed bandits + Upper Confidence bound (UCB) -> Policy and Value Functions -> Monte Carlo Search Tree (MCTS) + Upper confidence bound for Trees (UCT).

Building off a lot of what was said in the above guide, here is basically what we need to build it: - MCTS - The policy/value approximation network - 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks

We aren’t trying to make the next Magnus Carlsen so lets just try our best with what we have:

A general overview of MCTS would have the following steps: Selection: Select child node from the current node based on the tree policy. Expansion: Expand the child node based on the exploration / exploitation trade-off. Simulation: Simulate from the child node until termination or upon reaching a suitably small future reward (like from reward decay). Backup: Backup the reward along the path taken according to the tree policy.

How does this work for us? Let’s denote a node as a game state: so something like r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1 (FEN representation)

In traditional MCTS, we use the UCB formula denoted as: argmax(Q(s,a) + C * sqrt(ln(N(s)) / N(s,a))), where Q(s,a) = average reward/value, C = exploration constant, N(s) = total number of times parent node s has been visited, and N(s,a) = number of times action a has been taken from state s.

This balances exploitation: Q(s,a) - favors actions that have historically performed well And exploration: C * sqrt(ln(N(s)) / N(s,a)) - favors actions that have been tried less frequently

In the variation that alphazero introduces, we now have PUCT (Predictor + UCT) argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a)))))

Now where the hell did P(s,a) come from?

This is where part 2/ the policy and value approximation network comes into play

We need a network that will take in an input of a game state, and output the probability distributions of next possible actions + estimated value of current state. Your implementation may look like the following:

class SimplePolicyValueNetwork(nn.Module):
    """
    A very simple policy-value network for demonstration purposes.
    This network takes a game state as input and outputs:
    1. Policy: probability distribution over possible actions
    2. Value: estimated value of the current state
    """
    def __init__(self, input_size, action_size, hidden_size=128):
        super(SimplePolicyValueNetwork, self).__init__()
        self.shared_fc1 = nn.Linear(input_size, hidden_size)
        self.shared_fc2 = nn.Linear(hidden_size, hidden_size)
        self.policy_head = nn.Linear(hidden_size, action_size)
        self.value_head = nn.Linear(hidden_size, 1)

    def forward(self, x):
        x = F.relu(self.shared_fc1(x))
        shared_features = F.relu(self.shared_fc2(x))
        
        # Policy output (probabilities over actions)
        policy_logits = self.policy_head(shared_features)
        policy = F.softmax(policy_logits, dim=-1)
        
        # Value output (single scalar between -1 and 1)
        value = torch.tanh(self.value_head(shared_features))
        
        return policy, value

And so we have an output of dimensione [1,action_size] for the policy and value dimensions = [1, 1], representing good/bad

When MCTS visits new state -> run the approximation network We now have probability distribution for each possible action + how good they are. We plug that into our formula above, and now we tell the PUCT formula that these new moves look kinda good

And that’s basically it. As for the results you can look at my project here and my wandb runs here for whatever For whatever reason the runs keep crashing around the 10 hour mark, but I guess that is my divine punishment for trying to run it on a regular M3 w/ 8 GB of RAM. Will pick this back up when I gain a bazillion of GPU credits to see the performance + bug fix.

Sources and References

Chess Engine Implementation

Magic Bitboards Explained - Josh Watzman’s detailed article
Perft Test Results - Move generation verification
Minimax Implementation - Stack Overflow discussion

AlphaZero and Reinforcement Learning

AlphaGoZero Guide - Depth First Learning comprehensive tutorial
PUCT Variation - Computer-Go Archive discussion

Project Implementation

My Chess Engine Project - GitHub repository
Training Runs and Results - Weights & Biases dashboard

Happy coding! Feel free to reach out to me to discuss further!

--- title: "Building a Chess Engine from Scratch" author: "Daniel Ung" date: "2025-09-03" categories: [chess, ai, algorithms, python] draft: false format: html: code-fold: false code-tools: true --- tl;dr, how to represent a chess game programmatically? ## Chess Engine Architecture Overview The first decision is how to represent the chess board. We have several options: ### Option 1: 8x8 Array (Simple but Inefficient) ``` python class SimpleBoard: def __init__(self): # 8x8 board with piece codes self.board = [ ['r', 'n', 'b', 'q', 'k', 'b', 'n', 'r'], ['p', 'p', 'p', 'p', 'p', 'p', 'p', 'p'], ['.', '.', '.', '.', '.', '.', '.', '.'], ['.', '.', '.', '.', '.', '.', '.', '.'], ['.', '.', '.', '.', '.', '.', '.', '.'], ['.', '.', '.', '.', '.', '.', '.', '.'], ['P', 'P', 'P', 'P', 'P', 'P', 'P', 'P'], ['R', 'N', 'B', 'Q', 'K', 'B', 'N', 'R'] ] def get_piece(self, row, col): return self.board[row][col] def set_piece(self, row, col, piece): self.board[row][col] = piece ``` ### Option 2: Bitboards (More Computationally Efficient, let's go with this) ``` python class BitBoard: def __init__(self): # Each piece type gets its own 64-bit integer # We are basically just representing EVERYTHING in chess in 0s and 1s self.white_pawns = 0x000000000000FF00 self.white_knights = 0x0000000000000042 self.white_bishops = 0x0000000000000024 self.white_rooks = 0x0000000000000081 self.white_queens = 0x0000000000000008 self.white_king = 0x0000000000000010 self.black_pawns = 0x00FF000000000000 self.black_knights = 0x4200000000000000 # ... and so on ``` But how do we represent moves? Positions? Attacks? Bitmasks. Lots of them. Lets go more into depth about bitboards ### Understanding Bitboard Mapping A bitboard uses a 64-bit integer where each bit represents one square on the chess board. ``` Chess Board Layout: a b c d e f g h 8 ♜ ♞ ♝ ♛ ♚ ♝ ♞ ♜ 8 7 ♟ ♟ ♟ ♟ ♟ ♟ ♟ ♟ 7 6 . . . . . . . . 6 5 . . . . . . . . 5 4 . . . . . . . . 4 3 . . . . . . . . 3 2 ♙ ♙ ♙ ♙ ♙ ♙ ♙ ♙ 2 1 ♖ ♘ ♗ ♕ ♔ ♗ ♘ ♖ 1 a b c d e f g h Bitboard Square Numbers (0-63): 0 1 2 3 4 5 6 7 <- a8-h8 (rank 8) 8 9 10 11 12 13 14 15 <- a7-h7 (rank 7) 16 17 18 19 20 21 22 23 <- a6-h6 (rank 6) 24 25 26 27 28 29 30 31 <- a5-h5 (rank 5) 32 33 34 35 36 37 38 39 <- a4-h4 (rank 4) 40 41 42 43 44 45 46 47 <- a3-h3 (rank 3) 48 49 50 51 52 53 54 55 <- a2-h2 (rank 2) 56 57 58 59 60 61 62 63 <- a1-h1 (rank 1) Note: Each bit position represents 2^position. For example: 1000 0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 represents 2^63 + 2^56 (bits 63 and 56 are flipped/set to 1) ``` ### Binary Representation Example Let's look at the initial white pawns position: ```python # White pawns start on rank 2 (squares 48-55) white_pawns = 0x000000000000FF00 = # 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 0000 0000 <-- see where the 1's are # In binary (64 bits): # Bit position: 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 ... 7 6 5 4 3 2 1 0 # Binary: 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 0 0 # Square: a1 b1 c1 d1 e1 f1 g1 h1 a2 b2 c2 d2 e2 f2 g2 h2 ... # Visualized on 8x8 board (1 = pawn present, 0 = empty): # a b c d e f g h # 8 0 0 0 0 0 0 0 0 # 7 0 0 0 0 0 0 0 0 # 6 0 0 0 0 0 0 0 0 # 5 0 0 0 0 0 0 0 0 # 4 0 0 0 0 0 0 0 0 # 3 0 0 0 0 0 0 0 0 # 2 1 1 1 1 1 1 1 1 <- White pawns here # 1 0 0 0 0 0 0 0 0 #So for BLACK bishops, it would look something like this: black_bishops = 0x2400000000000000 = # = 0000 0010 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 <-- notice the 1's in the 3rd and 6th positions # a b c d e f g h # 8 0 0 1 0 0 1 0 0 <- Black bishops on c8 and f8 # 7 0 0 0 0 0 0 0 0 # 6 0 0 0 0 0 0 0 0 # 5 0 0 0 0 0 0 0 0 # 4 0 0 0 0 0 0 0 0 # 3 0 0 0 0 0 0 0 0 # 2 0 0 0 0 0 0 0 0 # 1 0 0 0 0 0 0 0 0 ``` ## Step 2: Move Generation Next, we need to generate all legal moves for each piece. That way, when a user moves a piece, it just has to fall into the set of legal moves. For this part, the following is helpful to know: We use bitwise operations such as BITWISE OR/AND to get what we need. Ex: We can get all white positions by BITWISE ORing all the white pieces together. all_white_positions = white_pawns | white_knights | white_bishops | white_rooks | white_queens | white_king So imagine we have white_pawns = 0x000000000000FF00 and white_knights = 0x0000000000000042 ..... We would get a result of 0x000000000000FF42 ... as well as all the other white pieces together. We could also do all_positions = white_positions | black_positions to get all positions on the board. You can kind of see how this is super helpful now. For each file, we can generate masks that can help us filter moves. Ex: If we want File A or file H, we can represent that as file_A = 0x8080808080808080 and file_H= 0x0101010101010101 , same goes for other diagonals, rows, and columns. For special cases such as en passant, and castling, we could have a bitmask like castling_rights = 0b1111 representing the 4 possible castling moves (black or white kingside or queenside etc), and flip them when we do castle. ``` python # For reference this is kind of what they look like A_FILE = 0x0101010101010101 # a1, a2, a3, ..., a8 # A_FILE = 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 0000 0001 H_FILE = 0x8080808080808080 # h1, h2, h3, ..., h8 # H_FILE = 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 1000 0000 # Rank masks (horizontal rows) RANK_1 = 0x00000000000000FF # a1, b1, c1, ..., h1 RANK_8 = 0xFF00000000000000 # a8, b8, c8, ..., h8 # Diagonal masks MAIN_DIAGONAL = 0x8040201008040201 # a1-h8 diagonal ANTI_DIAGONAL = 0x0102040810204080 # h1-a8 diagonal ``` Now for the actual move generation. We can do different things for different pieces. For pawns, we can just generate all the possible moves for each pawn. For knights, we can just generate all the possible moves for each knight. etc etc. This would look something like the following: ``` python class MoveGenerator: def __init__(self): # Pre-computed attack masks for different pieces self.knight_attacks = self.generate_knight_attack_table() self.king_attacks = self.generate_king_attack_table() # File and rank masks self.file_masks = self.generate_file_masks() self.rank_masks = self.generate_rank_masks() def generate_pawn_moves(self, board, color): """Generate all pawn moves using bitboard operations""" if color == 'white': pawns = board.white_pawns enemies = board.get_black_pieces() all_pieces = board.get_all_pieces() # Single pawn pushes # pawns << 8 means "shift all pawn bits 8 positions to the left" # This moves each pawn forward by one rank (8 squares = 1 rank) single_pushes = (pawns << 8) & ~all_pieces # Double pawn pushes (only from rank 2) double_pushes = ((single_pushes & 0x0000000000FF0000) << 8) & ~all_pieces # Pawn captures (diagonal attacks) left_attacks = ((pawns & ~0x8080808080808080) << 7) & enemies # Not on H-file right_attacks = ((pawns & ~0x0101010101010101) << 9) & enemies # Not on A-file return single_pushes | double_pushes | left_attacks | right_attacks else: # black pawns pawns = board.black_pawns enemies = board.get_white_pieces() all_pieces = board.get_all_pieces() # Single pawn pushes (black moves "down" the board) single_pushes = (pawns >> 8) & ~all_pieces # Double pawn pushes (only from rank 7) # Same as above but the AND is flipped double_pushes = ((single_pushes & 0x0000FF0000000000) >> 8) & ~all_pieces # Pawn captures left_attacks = ((pawns & ~0x0101010101010101) >> 7) & enemies # Not on A-file right_attacks = ((pawns & ~0x8080808080808080) >> 9) & enemies # Not on H-file return single_pushes | double_pushes | left_attacks | right_attacks ``` **What about sliding pieces? Rooks? Bishops? Queens?** These pieces are difficult to represent since pieces in the way of their sliding attack effectively blocks a lot of their paths. Much harder to compute. This is where we bring in the "magic" bitboards. [Josh Watzman's article](https://essays.jwatzman.org/essays/chess-move-generation-with-magic-bitboards.html) explains this in detail. Basically, we just construct a giant hashtable for all the moves made by those pieces. Pretty magical. For Kings and Knights, their moves are fixed. Much easier to compute. No need for magic bitboards with them. Won't go over details on how to implement checkmate, stalement, and the rest of it (will leave sources down below if you wish to do so) but after you finish building your chess board representation you can basically run a [perft test](https://www.chessprogramming.org/Perft_Results). Given these starting positions, you generate all legal moves going into X depth in and your #'s should match with the following results from the source. ## Step 3 : RL application in Chess Okay I am skipping over a bunch of the bitboard implementation but the top part should be sufficient enough to get an idea of what to do. How do we build out the chess engine to think now? Our first option looks like the following: Source: [stack_overflow](https://stackoverflow.com/questions/68684989/how-to-implement-a-minimax-algorithm-in-a-chess-like-game) ``` python def minimax(pos, depth, alpha, beta, maximizingPlayer) : # maximizingPlayer = True if White2Play and False is Black2Play if depth == 0 or game_is_finished(board) or can_t_move(board) : return evaluate(board) # <- zero ground for recusivity if maximizingPlayer : maxEval = -infinity(or a number smaller than your smallest evaluation) for child in child_of_position(board) : evalu = minimax(child, depth - 1, alpha, beta, False) # recursivity maxEval = max(maxEval, evalu) alpha = max(alpha, evalu) if beta <= alpha : break return maxEval else : minEval = infinity(or a number larger than your largest evaluation) for child in child_of_position(board) : evalu = minimax(child, depth - 1, alpha, beta, True) # recursivity minEval = min(minEval, evalu) alpha = min(alpha, evalu) if beta <= alpha : break return minEval ``` Basically, it's just recursive DFS where we are looking into the future of whether or not a state would result in a loss or win. Of course, this becomes problematic when we try to calculate all of the 10^120 possible game states. We can prune branches that are obviously bad states such as ones where we would hang the queen. Moving onto other implementations, what about AlphaZero? How do we implement that? As a reference, [this guide](https://www.depthfirstlearning.com/2018/AlphaGoZero) is absolutely golden to understand everything starting from minimax. It then goes on to explain prequisites to understanding AlphaZero, starting from Multi-armed bandits + Upper Confidence bound (UCB) -> Policy and Value Functions -> Monte Carlo Search Tree (MCTS) + Upper confidence bound for Trees (UCT). Building off a lot of what was said in the above guide, here is basically what we need to build it: - MCTS - The policy/value approximation network - 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks We aren't trying to make the next Magnus Carlsen so lets just try our best with what we have: A general overview of MCTS would have the following steps: Selection: Select child node from the current node based on the tree policy. Expansion: Expand the child node based on the exploration / exploitation trade-off. Simulation: Simulate from the child node until termination or upon reaching a suitably small future reward (like from reward decay). Backup: Backup the reward along the path taken according to the tree policy. How does this work for us? Let's denote a node as a game state: so something like r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1 (FEN representation) In traditional MCTS, we use the UCB formula denoted as: argmax(Q(s,a) + C * sqrt(ln(N(s)) / N(s,a))), where Q(s,a) = average reward/value, C = exploration constant, N(s) = total number of times parent node s has been visited, and N(s,a) = number of times action a has been taken from state s. This balances exploitation: Q(s,a) - favors actions that have historically performed well And exploration: C * sqrt(ln(N(s)) / N(s,a)) - favors actions that have been tried less frequently In [the variation that alphazero introduces](https://groups.google.com/g/computer-go-archive/c/K9XHb64JSqU), we now have PUCT (Predictor + UCT) argmax((Q(s,a) + Cpuct * P(s,a) * sqrt( lg(N(s)) / (1 + N(s,a))))) Now where the hell did P(s,a) come from? **This is where part 2/ the policy and value approximation network comes into play** We need a network that will take in an input of a game state, and output the probability distributions of next possible actions + estimated value of current state. Your implementation may look like the following: ``` python class SimplePolicyValueNetwork(nn.Module): """ A very simple policy-value network for demonstration purposes. This network takes a game state as input and outputs: 1. Policy: probability distribution over possible actions 2. Value: estimated value of the current state """ def __init__(self, input_size, action_size, hidden_size=128): super(SimplePolicyValueNetwork, self).__init__() self.shared_fc1 = nn.Linear(input_size, hidden_size) self.shared_fc2 = nn.Linear(hidden_size, hidden_size) self.policy_head = nn.Linear(hidden_size, action_size) self.value_head = nn.Linear(hidden_size, 1) def forward(self, x): x = F.relu(self.shared_fc1(x)) shared_features = F.relu(self.shared_fc2(x)) # Policy output (probabilities over actions) policy_logits = self.policy_head(shared_features) policy = F.softmax(policy_logits, dim=-1) # Value output (single scalar between -1 and 1) value = torch.tanh(self.value_head(shared_features)) return policy, value ``` And so we have an output of dimensione [1,action_size] for the policy and value dimensions = [1, 1], representing good/bad When MCTS visits new state -> run the approximation network We now have probability distribution for each possible action + how good they are. We plug that into our formula above, and now we tell the PUCT formula that these new moves look kinda good And that's basically it. As for the results you can look at [my project here](https://github.com/DannyMang/chessBot) and [my wandb runs here](https://wandb.ai/daniel-ung-19/chess-alphazero-tinygrad?nw=nwuserdanielung) for whatever For whatever reason the runs keep crashing around the 10 hour mark, but I guess that is my divine punishment for trying to run it on a regular M3 w/ 8 GB of RAM. Will pick this back up when I gain a bazillion of GPU credits to see the performance + bug fix. ## Sources and References ### Chess Engine Implementation - **Magic Bitboards Explained** - [Josh Watzman's detailed article](https://essays.jwatzman.org/essays/chess-move-generation-with-magic-bitboards.html) - **Perft Test Results** - [Move generation verification](https://www.chessprogramming.org/Perft_Results) - **Minimax Implementation** - [Stack Overflow discussion](https://stackoverflow.com/questions/68684989/how-to-implement-a-minimax-algorithm-in-a-chess-like-game) ### AlphaZero and Reinforcement Learning - **AlphaGoZero Guide** - [Depth First Learning comprehensive tutorial](https://www.depthfirstlearning.com/2018/AlphaGoZero) - **PUCT Variation** - [Computer-Go Archive discussion](https://groups.google.com/g/computer-go-archive/c/K9XHb64JSqU) ### Project Implementation - **My Chess Engine Project** - [GitHub repository](https://github.com/DannyMang/chessBot) - **Training Runs and Results** - [Weights & Biases dashboard](https://wandb.ai/daniel-ung-19/chess-alphazero-tinygrad?nw=nwuserdanielung) ------------------------------------------------------------------------ *Happy coding! Feel free to reach out to me to discuss further!*