Here's how to choose which vector database to use for your AI app in 2024

In the age of AI and NLP apps, Vector databases have become more popular than ever. Learn why and see how you can potentially make use of them in your RAG app.

Introduction

An explosion of natural language AI-powered apps that began with the launch of ChatGPT last year caused an unprecedented increase in demand for vector databases throughout the software industry.

Why vector databases? Because they are the most efficient storage option for embeddings which are numerical representation of data that AI models like GPT-4 uses to process and understand text.

Why not relational databases? While possible, they are not as efficient as vector databases, we'll see how exactly later in this article.

So, without further ado, let's define what vector databases are and explore how you can use them for your RAG app.

Relational (SQL) vs. Vector Databases

Relational databases are your best bet if you're working with structured data that require simple or complex querying capabilities. Vector databases excel at handling unstructured data such as images, audio, and text which makes them the best option for tasks involving machine learning and similarity searches.

Traditional Database Management Systems

We've all worked with SQL at some point. So I'm assuming you're familiar with how traditional databases handle storing and retrieving information. If not, here's a quick link to get started.

Screenshot from TablePlus running on my macOS

They are great for what they're built for, storing and querying structured data but not so much when handling high-dimensional numerical data, such as text, images, and audio.

For that kind of data, we're going to take a look at Vector Databases.

Introduction to Vector Databases

What are Vectors

As we've discussed earlier, vectors are numerical representations of data representing various types of information, such as images, audio, geolocation, or other data types.

Using the OpenAI embedding model, for example, we can see how the text "canine companions say" is transformed into a vector representation using OpenAI's embedding model.

Source: OpenAI

The great thing about vectors is that we can find similarities between them relatively easily using techniques like Euclidean distance or cosine similarity. This helps a lot in fields like data analysis and machine learning.

What are Vector Databases

Vector databases are a type of specialized database management system that are built to handle vector data efficiently.

Benefits of a Vector Database

There are numerous benefits to using a vector database. Here are the most obvious ones:

  1. Efficient storage and retrieval: Vectorized data can be stored and compared computationally inexpensively, making it easier to manage and retrieve large amounts of data.
  2. Support for similarity search: Vector queries in vector databases typically search for similar vectors using one or several query vectors. This allows for applications such as reverse image search, recommender systems, and chatbots.

Coming from a SQL Background

Okay, so you're coming from a SQL background and you're now looking to store vectors. What do you do?

You have two options:

  1. Integrate with a Vector Database
  2. If you already use Postgres, give pgvector a try

I recommend you take a look at the most popular Vector database choices (above) to start experimenting with integration into your application.

Choosing the best vector database

No one can tell you which vector database is the best for you. You'll need to take a look at the pros and cons of the available options to determine what is suitable for you.

Feature Chroma Pinecone Weaviate Milvus
Managed Service No Yes Yes Yes
Real-time Data Ingestion No Yes No Yes
Scalability Yes Yes Yes Yes
Language Support Python, JavaScript Python, JavaScript Python, Go, Java, Javascript/Node Python, Java, Go, C++
LangChain Support Yes Yes Yes Yes
LlamaIndex Support Yes Yes Yes Yes
💡
I recommend Chroma if you're starting, especially when working with LangChain or LlamaIndex. Here's a tutorial to get you started.

Final thoughts

If you're in the process or planning to build an AI-powered app, I suggest you get familiar with some of the popular vector databases. Sooner or later you'll find yourself working with embeddings and vectorized data.


Frequently asked questions

Which database do I use for my new project?

It depends. But you'll most likely be working with both traditional and vector databases if you need to store and retrieve vector data.

Do I need to move my data from an SQL to a Vector Database?

As mentioned above, the answer is No. You can (and should) keep your existing SQL database. However, you may decide to run a separate vector database if you need to store and retrieve high-dimension vector data.

Drop your comments or questions below and follow me on X (formerly twitter) for more updates!

Thanks for reading.