Let's Play Pictionary with the OpenAI Vision API
The best way to learn is by doing. In this tutorial, you'll see how EASY it is to set up image recognition using the OpenAI Python SDK in your app.
Wanna add Vision capabilities like image identification to your app?
Let's build a simple Pictionary game using OpenAI's Vision API and Python. In this tutorial, we'll create a web app where you can draw anything, and GPT-4o will try to guess what it is.
What we're building
A straightforward web application where:
- You draw on a canvas
- Your drawing gets sent to OpenAI's Vision API for Analysis
- GPT-4o responds with a one-word guess
The tech stack
- Python with Flask for the backend
- Simple HTML canvas for drawing
- OpenAI's Vision API (GPT-4o)
- Basic JavaScript for handling drawings
The code
Here's the core Python code that makes it all work:
from flask import Flask, request, jsonify
from flask_cors import CORS
from dotenv import load_dotenv
from openai import OpenAI
import os
import re
load_dotenv()
app = Flask(__name__)
CORS(app, resources={r"/submit-drawing": {"origins": "*"}}, methods=["POST", "OPTIONS"])
api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(
api_key=api_key,
)
@app.route('/submit-drawing', methods=['POST', 'OPTIONS'])
def submit_drawing():
if request.method == 'OPTIONS':
return jsonify({"message": "CORS preflight request successful"}), 200
data = request.json
image_data = data.get('image')
if not image_data:
return jsonify({"error": "No image data provided"}), 400
img_data_match = re.match(r'data:(image/.*?);base64,(.*)', image_data)
if not img_data_match:
return jsonify({"error": "Invalid image data format"}), 400
img_type, img_b64_str = img_data_match.groups()
# Define the prompt to process the image
prompt = "Analyze this image and guess what it is in a single word."
try:
response = client.chat.completions.create(
model="gpt-4o-2024-08-06", # gpt-4o, gpt-4o-mini
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {"url": f"data:{img_type};base64,{img_b64_str}"},
},
],
}
],
)
guess = response.choices[0].message.content
# Return the response as JSON
return jsonify({"guess": guess})
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
app.run(debug=True)
That's your app.py
. Don't forget to add your OPENAI_API_KEY
and to install the packages in your requirements.txt
:
flask
openai
python-dotenv
flask-cors
Testing this out
In your terminal, run python app.py
then load your index.html
and start drawing!
Complete source code
Here's the full front-end source code for index.html
, script.js
, and style.css
:
Paste the following into your index.html
: