Let's Play Pictionary with the OpenAI Vision API

The best way to learn is by doing. In this tutorial, you'll see how EASY it is to set up image recognition using the OpenAI Python SDK in your app.

Wanna add Vision capabilities like image identification to your app?

Let's build a simple Pictionary game using OpenAI's Vision API and Python. In this tutorial, we'll create a web app where you can draw anything, and GPT-4o will try to guess what it is.

What we're building

A straightforward web application where:

  • You draw on a canvas
  • Your drawing gets sent to OpenAI's Vision API for Analysis
  • GPT-4o responds with a one-word guess

The tech stack

  • Python with Flask for the backend
  • Simple HTML canvas for drawing
  • OpenAI's Vision API (GPT-4o)
  • Basic JavaScript for handling drawings

The code

Here's the core Python code that makes it all work:

from flask import Flask, request, jsonify
from flask_cors import CORS 

from dotenv import load_dotenv
from openai import OpenAI

import os
import re

load_dotenv()

app = Flask(__name__)
CORS(app, resources={r"/submit-drawing": {"origins": "*"}}, methods=["POST", "OPTIONS"])


api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(
    api_key=api_key,
)

@app.route('/submit-drawing', methods=['POST', 'OPTIONS'])
def submit_drawing():
    if request.method == 'OPTIONS':
        return jsonify({"message": "CORS preflight request successful"}), 200

    data = request.json
    image_data = data.get('image')

    if not image_data:
        return jsonify({"error": "No image data provided"}), 400

    img_data_match = re.match(r'data:(image/.*?);base64,(.*)', image_data)
    
    if not img_data_match:
        return jsonify({"error": "Invalid image data format"}), 400

    img_type, img_b64_str = img_data_match.groups()

    # Define the prompt to process the image
    prompt = "Analyze this image and guess what it is in a single word."

    try:
        response = client.chat.completions.create(
            model="gpt-4o-2024-08-06", # gpt-4o, gpt-4o-mini
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:{img_type};base64,{img_b64_str}"},
                        },
                    ],
                }
            ],
        )

        guess = response.choices[0].message.content

        # Return the response as JSON
        return jsonify({"guess": guess})

    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

That's your app.py. Don't forget to add your OPENAI_API_KEY and to install the packages in your requirements.txt:

flask
openai
python-dotenv
flask-cors

Testing this out

In your terminal, run python app.py then load your index.html and start drawing!

A car.
My beautiful car drawing

Complete source code

Here's the full front-end source code for index.html, script.js, and style.css:

Paste the following into your index.html: