Adding Retrieval-Augmented Generation (RAG)
While an agent that can perform math is nifty (LLMs are usually not very good at math), LLM-based applications are always more interesting when they work with large amounts of data. In this case, we're going to use a 200-page PDF of the proposed budget of the city of San Francisco for fiscal years 2024-2024 and 2024-2025. It's a great example because it's extremely wordy and full of tables of figures, which present a challenge for humans and LLMs alike.
To learn more about RAG, we recommend this introduction from our Python docs. We'll assume you know the basics:
- You need to parse your source data into chunks of text
- You need to encode that text as numbers, called embeddings
- You need to search your embeddings for the most relevant chunks of text
- You feed your relevant chunks and a query to an LLM to answer a question
We're going to start with the same agent we built in step 1, but make a few changes. You can find the finished version in the repository.
New dependencies
We'll be bringing in SimpleDirectoryReader
, HuggingFaceEmbedding
, VectorStoreIndex
, and QueryEngineTool
from LlamaIndex.TS, as well as the dependencies we previously used.
import {
OpenAI,
FunctionTool,
OpenAIAgent,
Settings,
SimpleDirectoryReader,
HuggingFaceEmbedding,
VectorStoreIndex,
QueryEngineTool,
} from "llamaindex";
Add an embedding model
To encode our text into embeddings, we'll need an embedding model. We could use OpenAI for this but to save on API calls we're going to use a local embedding model from HuggingFace.
Settings.embedModel = new HuggingFaceEmbedding({
modelType: "BAAI/bge-small-en-v1.5",
quantized: false,
});
Load data using SimpleDirectoryReader
SimpleDirectoryReader is a flexible tool that can read a variety of file formats. We're going to point it at our data directory, which contains just the single PDF file, and get it to return a set of documents.
const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");
Index our data
Now we turn our text into embeddings. The VectorStoreIndex
class takes care of this for us when we use the fromDocuments
method (it uses the embedding model we defined in Settings
earlier).
const index = await VectorStoreIndex.fromDocuments(documents);
Configure a retriever
Before LlamaIndex can send a query to the LLM, it needs to find the most relevant chunks to send. That's the purpose of a Retriever
. We're going to get VectorStoreIndex
to act as a retriever for us
const retriever = await index.asRetriever();