How to Integrate and Handle LLM Memory using LangChain
LLMs are stateless, meaning they do not have memory that lets them keep track of conversations. However, using LangChain we'll see how to integrate and manage memory easily.
Introduction
So, you've been playing around with LangChain and LLMs or you're in the process of building your AI-powered app. Awesome, if you're looking to find out how you can manage LLM memory in LangChain, this post is about just that.
The Need for Memory
Suppose you're talking to your friend about a movie they recently saw. However, your friend does not have a memory. He or she can only respond to a question without remembering what you were talking about.
Here's the conversation:
You: "Hey, did you catch a movie last night?"
Your Friend: "Oh, absolutely! best movie I've seen."
You: "What was it about?"
Your Friend: "What was what about?"
You get the picture. Your friend can't maintain a conversation since he or she does not have a memory.
(My wife says I don't have a memory as well...) 👀
Since LLMs are stateless by default, they are completely unaware of any context or conversations you've previously had with them so they do not remember what you've been talking about. That is why it's critical to integrate some kind of way to tell the Large Language Model with each new prompt what you've been talking about so that it can maintain an actual conversation with you.
LLM Memory in LangChain
Code below is written using Python, you'll just need to be a little familiar with the syntax.
Okay, let's get started.
Loading OpenAI API Key
We're going to load our API key from the .env
file, which is the recommended way of doing it. You can of course just hard code it in your application directly.
- In your project directory type
touch .env
- Open the
.env
file and typeOPENAI_API_KEY="..."
then save. - Install
python-dotenv
from terminal usingpip
by running:pip install python-dotenv
- Create
app.py
file and paste the following:
import os
from dotenv import load_dotenv
# Load environment variables from the .env file
load_dotenv()
Importing Modules
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
ChatOpenAI
is used to create a new model given some arguments.ConversationChain
is used to have a conversation and load context from memory.ConversationBufferMemory
is used to store conversation memory.
Conversing with the Model
llm = ChatOpenAI(temperature=0.2)
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory, verbose=False)
We've set up our llm
using default OpenAI settings. The temperature parameter here defines the accuracy of the predictions, the less the better.
Okay, now let's have the same conversation that we've had with our friend from the paragraph above:
Great answer. Okay, let's follow up with another related question now:
As you can see, the LLM was able to respond to a follow-up question. How did this happen? Well let's take a look at this line of code again:
conversation = ConversationChain(llm=llm, memory=memory, verbose=False)
You can see here, we're passing in a ConversationBufferMemory()
object to the ConversationChain
's memory
parameter. This enables LangChain to keep track of each transaction (or query) that we've sent to the LLM by storing it in a list.
ConversationBufferMemory()
, it is able to send the full context with each subsequent query.Manipulating Memory
Now let's assume that we want to manipulate the memory variables. Luckily LangChain provides a few methods that can help us do that.
We're going to look at these ConversationBufferMemory
methods, specifically:
load_memory_variables
save_context
clear
memory.load_memory_variables
By calling the memory.load_memory_variables({})
method, we'll be able to take a closer look at exactly what the memory contains. This should include all the inputs and outputs. Meaning everything that we sent to the LLM as well as all the LLM's responses.
Let's do that below:
print(memory.load_memory_variables({}))
Here's the history:
{'history': "Human: Hey, did you catch a movie last night?\nAI: Yes, I watched the Batman movie last night. It was quite entertaining and action-packed.\nHuman: What was it about?\nAI: The Batman movie was about Batman's quest to protect Gotham City from a new villain."}
memory.save_context
Now let's assume you want to manually add context to the memory. We can make use of the save_context
method like this:
memory.save_context({"input": "Assume Batman was actually a chicken."}, { "output": "OK" })
Since we manually added context into the memory, LangChain will append the new information to the context and pass this information along with the conversation history to the LLM.
To test this, we can run the predict
method and wait for the answer:
print(conversation.predict(input="Is Batman a human?"))
>>> No, Batman is not a human. He is a chicken in this scenario.
Awesome. Sorry Batman. 🐔
memory.clear
Calling the clear
method will just wipe the context and clear all the memory contents.
Managing Memory Size
You might be wondering, what if a user is having a very long conversation with the LLM, what happens in that case? The context will become too large at some point. This will result in more expensive queries for you. A way to deal with this is to use the ConversationBufferWindowMemory
instead of the ConversationBufferMemory
.
From the official docs:
ConversationBufferWindowMemory
keeps a list of the interactions of the conversation over time. It only uses the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.
That's exactly what we need, let's take a look at how we can implement this:
# Import from LangChain memory
from langchain.memory import ConversationBufferWindowMemory
# Instantiate memory object from `ConversationBufferWindowMemory` instead of `ConversationBufferMemory`
llm = ChatOpenAI(temperature=0.2)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(llm=llm, memory=memory, verbose=False)
In this example, the context will only hold one interaction (hence the k=1
parameter). Meaning if we were to take our previous example, LangChain would only have information about the last query.
Of course, you can manipulate the k value to suit your specific needs.
Wrapping Up
That's all folks! In this post, we've seen how to work with memory using LangChain and how we can manipulate and customize the context used to query the LLM.
If you've enjoyed this post, please go ahead and subscribe so you get the latest information before anyone else does! Once you subscribe, you'll get notified by Email as soon as a new post is available so you're always up to date.
Membership is completely free and just requires your email and name. Most importantly, I'll never send you spam.
Thanks for reading!
Further readings
More from Getting Started with AI
- New to LangChain? Start here!
- How to Convert Natural Language Text to SQL using LangChain
- What is the Difference Between LlamaIndex and LangChain
- An introduction to RAG tools and frameworks: Haystack, LangChain, and LlamaIndex
- A comparison between OpenAI GPTs and its open-source alternative LangChain OpenGPTs
More from the Web
Frequently Asked Questions
How does memory work in LangChain?
LangChain stores human input and AI output in a list known as memory. Since LLMs are stateless and are only aware of the latest query, LangChain sends the context (which means the full conversation history) with every new query to the LLM.
Does LangChain manage memory out of the box?
Yes, it does. LangChain takes care of memory out of the box. It keeps track of human input and AI output so that conversations remain relevant.
Can I customize memory using LangChain?
Yes, it is. You can easily customize memory storage by using the load_memory_variables
, save_context
, and clear
methods from the ConversationMemoryBuffer
class. You can also customize the buffer size by providing a k=int value to the ConversationBufferWindowMemory
class.