OpenAI

Let's Play Pictionary with the OpenAI Vision API

The best way to learn is by doing. In this tutorial, you'll see how EASY it is to set up image recognition using the OpenAI Python SDK in your app.

jeff

Nov 11, 2024 — 6 min read

Wanna add Vision capabilities like image identification to your app?

Let's build a simple Pictionary game using OpenAI's Vision API and Python. In this tutorial, we'll create a web app where you can draw anything, and GPT-4o will try to guess what it is.

What we're building

A straightforward web application where:

You draw on a canvas
Your drawing gets sent to OpenAI's Vision API for Analysis
GPT-4o responds with a one-word guess

The tech stack

Python with Flask for the backend
Simple HTML canvas for drawing
OpenAI's Vision API (GPT-4o)
Basic JavaScript for handling drawings

The code

Here's the core Python code that makes it all work:

from flask import Flask, request, jsonify
from flask_cors import CORS 

from dotenv import load_dotenv
from openai import OpenAI

import os
import re

load_dotenv()

app = Flask(__name__)
CORS(app, resources={r"/submit-drawing": {"origins": "*"}}, methods=["POST", "OPTIONS"])


api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(
    api_key=api_key,
)

@app.route('/submit-drawing', methods=['POST', 'OPTIONS'])
def submit_drawing():
    if request.method == 'OPTIONS':
        return jsonify({"message": "CORS preflight request successful"}), 200

    data = request.json
    image_data = data.get('image')

    if not image_data:
        return jsonify({"error": "No image data provided"}), 400

    img_data_match = re.match(r'data:(image/.*?);base64,(.*)', image_data)
    
    if not img_data_match:
        return jsonify({"error": "Invalid image data format"}), 400

    img_type, img_b64_str = img_data_match.groups()

    # Define the prompt to process the image
    prompt = "Analyze this image and guess what it is in a single word."

    try:
        response = client.chat.completions.create(
            model="gpt-4o-2024-08-06", # gpt-4o, gpt-4o-mini
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:{img_type};base64,{img_b64_str}"},
                        },
                    ],
                }
            ],
        )

        guess = response.choices[0].message.content

        # Return the response as JSON
        return jsonify({"guess": guess})

    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

That's your app.py. Don't forget to add your OPENAI_API_KEY and to install the packages in your requirements.txt:

flask
openai
python-dotenv
flask-cors

Testing this out

In your terminal, run python app.py then load your index.html and start drawing!

Complete source code

Here's the full front-end source code for index.html, script.js, and style.css:

Paste the following into your index.html:

Let's Play Pictionary with the OpenAI Vision API

jeff

What we're building

The tech stack

The code

Testing this out

Complete source code

Read more

AI Is Changing Your Brain (Whether You Like It Or Not)

AutoGen 0.4.8 Introduces Native Ollama Support: Run AI Agents Locally

AutoGen 0.4 Tutorial - Create a Team of AI Agents (+ Local LLM w/ Ollama)

OpenAI Just Released GPT-4.5: Here's What You Need to Know