How to use LangChain output parsers to structure large language models responses
If you're wondering how you can convert the text returned by an LLM to a Pydantic (JSON) model in your Python app, this post is for you.
JSON
mode that ensures output is always in JSON
format. You can read more on OpenAI's official site.Large Language Models (or LLMs) generate text and when you're building an application, you'll sometimes need to work with structured data instead of strings. LangChain provides Output Parsers which can help us do just that.
We will go over the Pydantic (JSON) Parser provided by LangChain.
There are more parsers available, but I'll leave those out of this post. If you'd like to know more about any of them make sure to let me know by dropping a comment at the end of this post!
Why Parse Data?
This one is obvious, but we'll answer it anyway. Parsing data helps us convert it into more readable formats which improves the overall quality of data.
Suppose you wanted to do a simple arithmetic operation between two integers, you'd need to convert a given string into an integer. There are also other benefits of having clean and structured data. The most obvious is how easily the data would fit into your existing models and databases.
Making a Reservation
Back in May of 2023, I published a post about interacting with computers using natural language where I asked ChatGPT to make me a fictional reservation at a restaurant and to respond with a JSON
object instead of plain old text.
Long story short, here's the output from that post:
{
"intent": "book_reservation",
"parameters": {
"date": "2023-05-05",
"time": "18:00:00",
"party_size": 2,
"cuisine": "any"
}
}
While we can ask the LLM to return a JSON
and give it the format explicitly (just like we did in the previous post), it's important to recognize that this may not work in some cases as the model might hallucinate.
Preparing our Query Template
Okay, let's use the same query from the previous post, but instead of requesting that the response is in the JSON
format, we'll add a {format_instructions}
placeholder as shown below:
reservation_template = '''
Book us a nice table for two this Friday at 6:00 PM.
Choose any cuisine, it doesn't matter. Send the confirmation by email.
Our location is: {query}
Format instructions:
{format_instructions}
'''
Great, we have our beautiful query. Below, we're going to see how LangChain will automatically take care of populating the {format_instructions}
placeholder.
The Pydantic (JSON) Parser
In order to tell LangChain that we'll need to convert the text to a Pydantic object, we'll need to define the Reservation
object first. So, let's jump right in:
Great, we now have the Reservation
object and its parameters. Let's tell LangChain that we'll need a parser
that'll convert a given input into that Reservation
object:
parser = PydanticOutputParser(pydantic_object=Reservation)
Setting up the Prompt Template
We're now going to set up our prompt template as such:
prompt = PromptTemplate(
template=reservation_template,
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
Notice the partial_variables={"format_instructions": parser.get_format_instructions()}
line? This tells LangChain to replace the format_instructions
variable in our template above with the structure of the Pydantic Reservation
object that we created.
Let's add our location query and see what LangChain will do behind the scenes to our original query and what it'll look like.
_input = prompt.format_prompt(query="San Francisco, CA")
Let's see what our query looks like now:
print(_input.to_string())
>> Book us a nice table for two this Friday at 6:00 PM.
>> Choose any cuisine, it doesn't matter. Send the confirmation by email.
>>
>> Our location is: San Francisco, CA
>>
>> Format instructions:
>> The output should be formatted as a JSON instance that conforms to the JSON schema below.
>>
>> As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
>> the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
>>
>> Here is the output schema:
>> ```
>> {"properties": {"date": {"title": "Date", "description": "reservation date", "type": "string"}, "time": {"title": "Time", "description": "reservation time", "type": "string"}, "party_size": {"title": "Party Size", "description": "number of people", "type": "integer"}, "cuisine": {"title": "Cuisine", "description": "preferred cuisine", "type": "string"}}, "required": ["date", "time", "party_size", "cuisine"]}
>> ```
Great, as you can see LangChain did a lot of work for us. It automatically converted the Pydantic object that we created into a string that is used to define the structure of the response for the LLM.
But that's not all, after we query the model we can use LangChain's parser
to automatically convert the text response that we got from the model to the Reservation
object.
Here's how we do this:
# We query the model first
output = model(_input.to_string())
# We parse the output
reservation = parser.parse(output)
Awesome, let's print the reservation
fields (and data types) by iterating over each element:
for parameter in reservation.__fields__:
print(f"{parameter}: {reservation.__dict__[parameter]}, {type(reservation.__dict__[parameter])}")
Here's the output:
>> date: Friday, <class 'str'>
>> time: 6:00 PM, <class 'str'>
>> party_size: 2, <class 'int'>
>> cuisine: Any, <class 'str'>
Notice that party_size
is of type int
now. We can also obviously access the party_size
property directly as such: reservation.party_size
.
Other Output Parsers
As I mentioned earlier in the post, LangChain provides many more output parsers that you can use depending on your specific use case. The same logic of what is happening behind the scenes applies to most of them.
A couple of interesting parsers are the Retry and Auto-fixing parsers. The retry parser attempts to re-query the model for an answer that fits the parser parameters, and the auto-fixing parser triggers if a related output parser fails in an attempt to fix the output. If you'd like me to cover these parsers in a future post, please let me know in the comments below!
Here's a full list of the LangChain output parsers:
Here's the full code:
Final Thoughts
In a nutshell, integrating LangChain's Pydantic Output Parser into your Python application makes working programmatically with the text returned from a Large Language Model easy.
It also helps you structure the data in a way that can easily be integrated with existing models, and databases.
Thanks for reading!
I'd love to connect with you on X as well as on the blog here. So if you're not a member, please subscribe for free now! And if you're on X, here's my profile link if you'd like to follow me for more updates.
Further readings
More from Getting Started with AI
- New to LangChain? Start here!
- How to convert PDF to JSON using LangChain and GPT
- How to Convert Natural Language Text to SQL using LangChain
- What is the Difference Between LlamaIndex and LangChain
- An introduction to RAG tools and frameworks: Haystack, LangChain, and LlamaIndex
- A comparison between OpenAI GPTs and its open-source alternative LangChain OpenGPTs