The beginner's guide to start using the new Gemini Pro models in Google AI Studio

You've heard about the GPT killer from Google: Gemini. Read this guide to see how you can take Google's new AI model for a spin.

The beginner's guide to start using the new Gemini Pro models in Google AI Studio
Getting started with Gemini Pro and Pro Vision in Google AI Studio

What is Gemini?

Last week, Google announced Gemini its latest and most capable AI model that aims to take OpenAI's GPT head-on. Gemini is built with multimodality in mind, this means that it is capable of understanding text, images, videos, audio, and code.

There are three versions of Gemini. Nano and Pro are available now, and Ultra is coming early next year. Gemini replaces PaLM 2 as the generative AI model powering Bard.

💡
For more information about Gemini and the announcement from last week, I recommend you go through this post.

What is Google AI Studio?

Google AI Studio is a free web-based tool that provides access to Google's AI generative models including Gemini. It lets you easily test Google's AI models and experiment with different scenarios and use cases.

After you're done experimenting, Google AI Studio lets you export the code in many popular programming languages, including Python, JavaScript, and others.

What is Google Vertex AI?

Vertex AI provides the Gemini API which gives you access to the Gemini Pro and Pro Vision models. Vertex AI is part of the Google Cloud Platform and unlike the AI Studio it offers a full solution to develop, train, deploy, and monitor ML models.

💡
At the time of writing, the Vertex AI Gemini API is in preview mode.

To keep this post simple, we're only going to go over the Google AI Studio.

Here's what we're going to do

Okay, great! We've covered important information so far. We're now going to try the AI Studio and write a simple multimodal prompt to ask the Gemini Pro Vision model whether a certain coffee drink is good for someone on a Vegan diet.

Then we'll quickly go over how to generate the necessary code for this prompt and see how it could be added to our application.

How to use Google AI Studio

Let's go through the user interface of Google AI Studio first and see how we can start testing and generating code for our app in no time.

Get access to Google AI Studio

First things first! Head over to the Google AI site and sign in with your Google account.

Google AI for Developers landing page
Google AI for Developers landing page

After you click on "Get API key in Google AI Studio", you'll be redirected to the main page where we're going to create our first prompt.

Create a new prompt

After you click on "Create New" as shown below, you'll have a few options to choose from. Go ahead and pick "Freeform prompt". It's just an interactive text area where you can write prompts and generate responses from the model.

Creating a new prompt in Google AI Studio
Creating a new prompt in Google AI Studio

Gemini Pro vs. Gemini Pro Vision

Currently, the following models are supported within the Vertex AI Gemini API and the Google AI Studio:

  • Gemini Pro: Used for natural language tasks, multi-turn text, and code generation.
  • Gemini Pro Vision: Supports multimodal prompts. This means that prompts can include text, images, and video.

Choose Gemini Pro Vision for multimodal prompting

Okay, for this example we're going to switch over to the Gemini Pro Vision model and write a multimodal prompt made up of text and the image below:

Cup of latte: Unclear whether dairy or non-dairy milk is used to make it
Cup of latte: Unclear whether dairy or non-dairy milk is used to make it

Just click on the "Model" drop-down as shown below and choose "Gemini Pro Vision".

Choosing the Gemini Pro Vision model from the dropdown
Choosing the Gemini Pro Vision model from the dropdown

Easy. Right? I am now going to write this simple prompt in the text area: "Is this suitable for someone on a Vegan diet?" and attach the image of the latte drink above.

Gemini will process the text prompt then identify what is in the image then respond to the prompt based on the given information.

Let's do it:

  1. Type in our prompt
  2. Insert our image
  3. Click on "Run" (Alternatively, you could use CMD+return)
Creating and running our prompt in Google AI Studio
Creating and running our prompt in Google AI Studio

In this case, Gemini Pro Vision took a second and responded with the below:

"Vegans do not consume any animal products so will not drink cow's milk. It is possible to order a plant-based milk alternative such as oat milk, soy milk or almond milk."

That's a good answer to be honest. It determined that the drink contains milk but assumed that it was cow's milk, which is a good assumption since it is the most popular.

I asked GPT-4 the same thing and this was the answer:

"The image you've provided shows a cup of coffee with what appears to be latte art on the surface of the beverage. If the beverage is made with cow's milk, it would not be suitable for someone on a vegan diet, as vegans do not consume dairy products.
However, if the milk used to create the latte art is a plant-based alternative such as soy milk, almond milk, oat milk, or any other dairy-free milk, then it would be suitable for a vegan diet."

GPT-4 did slightly better in my opinion since it did not assume that the drink contains cows' milk by default.

⚠️
I'm waiting for Gemini Ultra to come out early next year to compare its output since it's directly competing with GPT-4. The comparison above is just looking at what's currently available from OpenAI and Google.

Export the code to your app

When you're done experimenting with different prompts and scenarios, you can easily grab the code and paste it into your application.

To do so, click on the "Get Code" button as shown below:

Generated Python code from the Google AI Studio
Generated Python code from the Google AI Studio

Google AI Studio lets you choose from a range of supported programming languages including Python and JavaScript.

Here's the code that the Google AI Studio generated for our prompt:

"""
At the command line, only need to run once to install the package via pip:

$ pip install google-generativeai
"""

from pathlib import Path
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# Set up the model
generation_config = {
  "temperature": 0.4,
  "top_p": 1,
  "top_k": 32,
  "max_output_tokens": 4096,
}

safety_settings = [
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_MEDIUM_AND_ABOVE"
  },
  {
    "category": "HARM_CATEGORY_HATE_SPEECH",
    "threshold": "BLOCK_MEDIUM_AND_ABOVE"
  },
  {
    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
    "threshold": "BLOCK_MEDIUM_AND_ABOVE"
  },
  {
    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
    "threshold": "BLOCK_MEDIUM_AND_ABOVE"
  }
]

model = genai.GenerativeModel(model_name="gemini-pro-vision",
                              generation_config=generation_config,
                              safety_settings=safety_settings)

# Validate that an image is present
if not (img := Path("image0.png")).exists():
  raise FileNotFoundError(f"Could not find image: {img}")

image_parts = [
  {
    "mime_type": "image/png",
    "data": Path("image0.png").read_bytes()
  },
]

prompt_parts = [
  "Is this suitable for someone on a Vegan diet?\n",
  image_parts[0],
]

response = model.generate_content(prompt_parts)
print(response.text)

That was sort of... effortless... No? We now have the necessary code that we can just paste into our app to integrate with Gemini Pro Vision. Obviously, this will need customizations to fit your specific use case and application logic. But hey, it's a great start!

If you look closely, you'll notice a placeholder YOUR_API_KEY which we'll need to replace with our API key.

Get your Gemini API key

Let's grab our API key by clicking on the "Get API key" menu item as shown in the screenshot below:

Creating and locating your API key in Google AI Studio
Creating and locating your API key in Google AI Studio

You can choose "Create API key in new project" or "Create API key in existing project".

⚠️
I recommend you create an API key for each application you're working on. This makes it easier to stop access to your resources in case something goes wrong.

After you create your API key, just add it to the code and you're ready to go!

Closing thoughts

Honestly, I am still playing around with Gemini Pro and Gemini Pro Vision. I somewhat still feel GPT-4 responds more naturally to some questions, however, I'm waiting to test the Ultra version which is coming out next year and as per Google, performs better than GPT-4.

As far as getting up and running with almost zero effort, Google AI Studio makes it really efficient for us to experiment and generate functional code that can be directly plugged into our apps.

In my opinion, it's one of the best prototyping tools out there.

💡
Stay tuned. I'm also testing out the Vertex AI Gemini API and will probably write a tutorial on the topic in the coming weeks. Let me know in the comments if you'd like me to focus on something specific.

FAQ: Frequently asked questions

What's the difference between Google AI Studio and Vertex AI

Think of Google AI Studio as a prototyping tool for developers and data scientists to interact and test Google's AI models, including Gemini. Vertex AI is an end-to-end machine learning platform that offers in-depth tools for model training and deployment. They both give us access to the latest Gemini models.

Further readings

More from Getting Started with AI

More from Google