Here's a Fun Chess Experiment: ChatGPT vs. Stockfish (vs. Claude)

What happens when Large Language Models like GPT and Claude play against each other and take on Stockfish, the ultimate chess engine? Spoiler alert: it’s as entertaining as it is enlightening.

So, I made this video a few weeks ago for two reasons:

  1. To see if LLMs can play chess against each other.
  2. To see if they stand a chance against the only chess engine: Stockfish.
🚨
Spoiler alert: One of them got totally destroyed in a few moves.

If you’ve been following the crazy advancements in LLMs recently, you know how diverse these models have become. From drafting essays to writing code, they seem unstoppable. But can they play chess? I decided to run this experiment to answer that question. So, I put GPT and Claude head-to-head against each other and then against Stockfish, the one and only chess engine.

So I built the following interface to run this test:

Screenshot of the UI used for this experiment
UI Screenshot
  1. A chessboard.
  2. A reasoning log showing the LLMs' "thoughts."
  3. A move log capturing every move.
  4. Player selection dropdowns.

Here’s the twist: Unlike Stockfish, which is built specifically to dominate chess with optimized algorithms and positional evaluation, GPT and Claude are Large Language Models (LLMs). They process and generate text, not chess strategies. But with some creative prompting, they can make moves—and the results are interesting.

Requirements

To bring this to life, we'll need the following:

Next, we'll prepare our environment. For this experiment, I built a simple FastAPI backend and a custom web interface using HTML, CSS, and vanilla JavaScript (which won’t be covered in this post).

Set Up Environment Variables

  1. Install and activate your virtual environment:
python -m venv venv
source venv/bin/active
  1. Create an .env file in your project directory:
pip install python-dotenv
touch .env
  1. Add your API keys and configuration to the .env file:
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
  1. Load environment variables in main.py using dotenv:
from dotenv import load_dotenv

...

load_dotenv()

Design the API with FastAPI

Create a main.py file and set up the FastAPI app:

...

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

...

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

So we're just setting up the basics here, including some CORS config. Now, let's define the game state models using Pydantic:

from pydantic import BaseModel
from typing import List, Optional

...

class PieceUnderAttack(BaseModel):
    piece: str
    square: str

class AttackedPieces(BaseModel):
    white: List[PieceUnderAttack] = []
    black: List[PieceUnderAttack] = []

class GameState(BaseModel):
    fen: str
    pgn: str
    player_color: str
    legal_moves: List[str]
    last_player_move: Optional[str] = None
    last_opponent_move: Optional[str] = None
    attacked_pieces: Optional[AttackedPieces] = AttackedPieces()
    model: str

...

Let me explain the purpose of each class above:

  1. PieceUnderAttack: Holds info about board pieces currently under attack by the opponent.
  2. AttackedPieces: A list of all the attacked pieces.
  3. GameState: Holds essential game data (We'll cover this further later in this article).
💡
Without this information, LLMs will play imaginary chess! I tried it without extra details, and both GPT and Claude suggested illegal moves. This setup helps guide the models.

Obviously you can optimize this by either modifying, adding, or removing information to improve gameplay. But that's the basic set up that we're going to work with.

Next, we'll set up the OpenAI and Anthropic clients to communicate with GPT and Claude. First, install the packages using pip:

pip install openai
pip install anthropic

Then, let's add the following lines of code:

import os
from openai import OpenAI
from anthropic import Anthropic

...

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
anthropic_client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
💡
Don't forget to add your OPENAI_API_KEY and ANTHROPIC_API_KEY to your .env file.

Getting Moves from LLMs

Great! We're pretty much done with setting up the basics. Now let's create our first function which will be used to get suggested moves from OpenAI or Anthropic models. We'll call it get_llm_move:

@app.post("/get_move")
async def get_llm_move(game_state: GameState):

    last_moves_text = (
        f"The last move you made was: {game_state.last_player_move}. "
        f"The last move your opponent made was: {game_state.last_opponent_move}."
    )

    attacked_pieces_description = (
        f"White's pieces under attack: {', '.join([f'{piece.piece} on {piece.square}' for piece in game_state.attacked_pieces.white])}.\n"
        f"Black's pieces under attack: {', '.join([f'{piece.piece} on {piece.square}' for piece in game_state.attacked_pieces.black])}.\n"
    )
    
    # Chain of Thought System Message
    system_message = (
        "You are the greatest chessmaster in the world, with deep strategic knowledge and unparalleled skill in chess. "
        "For each move, first evaluate the current board state, identifying threats and opportunities. "
        "Then, consider the best possible moves from the available legal moves, aiming to win by checkmating your opponent. "
        "After reasoning through these options, choose the optimal move that: "
        "1. Maximizes your advantage or mitigates any risks. "
        "2. Develops your pieces and strengthens your position. "
        "3. None of your options should include a square that may result in your piece being captured (for no reason). "
        "4. Controls the center of the board, gaining space and limiting your opponent's mobility. "
        "5. Coordinates your pieces to create potential threats or defend vulnerable points. "
        "Provide a step-by-step reasoning process, followed by the final move in Standard Algebraic Notation (SAN). "
        "Output your answer in the following format exactly, on a new line: 'Move: <move>'. For example, respond with 'Move: Nf6'. "
        "Do not include any other text on the line with the move. Only provide the move in that specific format to ensure accuracy."
    )

    
    prompt = (
        f"Here's the current chess board state in FEN format: {game_state.fen}. "
        f"The game so far in PGN notation is: {game_state.pgn}. "
        f"You are playing as {game_state.player_color}. "
        f"{last_moves_text} "
        f"Your available legal moves are: {', '.join(game_state.legal_moves)}. "
        f"{attacked_pieces_description} "
        "NEVER move into a square that is unprotected or threatened and can be captured by the opponent. "
        "NEVER lose a piece for nothing and NEVER lose a piece to a lesser-valued piece. "
        "Evaluate the board carefully, considering immediate threats and opportunities, then provide the best move."
    )

    try:
        max_attempts = 4
        for attempt in range(max_attempts):

            if game_state.model.startswith("claude"):

                message = anthropic_client.messages.create(
                    max_tokens=1024,
                    system=system_message,
                    messages=[
                        {"role": "user", "content": prompt}
                    ],
                    model=game_state.model 
                )

                response_text = message.content[0].text.strip()
                print(f"{game_state.model}: {response_text}")
            else:
                chat_completion = openai_client.chat.completions.create(
                    model=game_state.model,
                    messages=[
                        {"role": "system", "content": system_message},
                        {"role": "user", "content": prompt}
                    ]
                )

                response_text = chat_completion.choices[0].message.content.strip()
                print(f"{game_state.model}: {response_text}")

            suggested_move = extract_move_from_response(response_text)

            if suggested_move in game_state.legal_moves:
                 return {
                    "move": suggested_move,
                    "reasoning": response_text
                }

            print(f"Invalid move suggested by AI: {suggested_move}")

            prompt = (
                f"The move '{suggested_move}' is not valid. "
                f"Please choose only from these legal moves: {', '.join(game_state.legal_moves)}. "
                "Only provide a valid move from this list."
            )


        raise HTTPException(status_code=400, detail="AI failed to provide a valid move after multiple attempts.")

    except Exception as e:

        raise HTTPException(status_code=500, detail=str(e))

So, what's happening here? Hint: It's actually way simpler than you think.

Basically, the endpoint above asks the LLMs for its best move given the data we provide it with. The way it works is straightforward - we pass the GameStats object that we saw earlier, then we send a request to the LLM for a best move prediction (either Claude or OpenAI, depending on whose turn it is).

The LLM uses the given data and comes back with both a move suggestion and its reasoning for why that's a good move. Before we actually play the suggested move though, we do a quick check to make sure it's actually suggesting a legal move - you know, because LLMs can hallucinate sometimes. If the suggested move is invalid, we prompt the LLM again while providing it with a list of legal moves to choose from.

Finally, if anything goes wrong or the model suggests something weird, we'll catch that and let the player know there was an issue.

Extracting Moves

To keep the code tidy, I'm using extract_move_from_response(string) to extract and return the LLM suggested move in SAN (Standard Algebraic Notation). Here's what the helper function looks like:

import re

...

def extract_move_from_response(response_text):
    match = re.search(r"Move:\s*([a-h][1-8]|[KQRBN][a-h1-8]+|O-O-O|O-O)", response_text)
    if match:
        return match.group(1).strip()
    return None

Interacting with Stockfish

Alright, now let's write the code to interact with the Stockfish Engine using the python-chess library:

pip install chess

For more information about this package, go to the official docs.

import chess
import chess.engine

...

@app.post("/get_stockfish_move")
async def get_stockfish_move(request: StockfishRequest):
    move = get_stockfish_move_local(request.fen)
    if not move:
        raise HTTPException(status_code=500, detail="Stockfish failed to provide a move.")
    return {"move": move}

...

def get_stockfish_move_local(fen: str) -> str:
    try:
        engine_path = "path/to/stockfish"  # Change to your Stockfish binary path
        board = chess.Board(fen)

        with chess.engine.SimpleEngine.popen_uci(engine_path) as engine:
            result = engine.play(board, chess.engine.Limit(time=1.0))
            return board.san(result.move)
    except Exception as e:
        print(f"Error with Stockfish: {e}")
        return None

Let's break this down. We create a new chess board using the chess library and we set all pieces in current position using the provided fen string. Then the local Stockfish engine could use that to suggest the next best move, which we then return to the get_stockfish_move endpoint.

💡
Don't forget to install Stockfish on your computer and update the engine_path value. Instructions can be found here.

That's basically everything that we need to do to set this up.

Running the application

Time to test! Let's run:

 python3 main.py

Here are the three games if you're interested in watching one in particular:

Honestly, I didn’t expect these results! Well, kind of—but the LLMs surprised me!

Watch the Games (Plus tutorial)

Let me know what you think of this experiment and if you've done something similar yourself!