What is RAG?
Retrieval Augmented Generation (RAG) is an important component in Generative AI. It allows important context to be included with the prompt to the LLM.
Why we use RAG? What are the benefits, beyond what an LLM alone can deliver?
1. The RAG has access to information that may be fresher than the data used to train the LLM.
2. Data in the RAG’s knowledge repository can be continually updated without incurring significant costs.
3. The RAG’s knowledge repository can contain data that’s more contextual than the data in a generalized LLM.
4. The source of the information in the RAG’s vector database can be identified. And because the data sources are known, incorrect information in the RAG can be corrected or deleted.
In this blog post I am going to build a simple RAG application using Oracle 23ai Vector Search and OpenAI LLM. I am using Oracle 23ai as vector store here.
Pre-requisites:
1. Oracle 23ai installed
2. Access to OpenAI model
3. Jupyter notebook and using LangChain framework
To build RAG application, it requires 7 steps as you can see in above diagram.
1. Load your document.
2. Transform the document to text.
3. Chunk the text document into smaller pieces.
4. Using an embedding model, embed the chunks as vectors into Oracle Database 23ai.
5. Ask the question for the prompt, the prompt will use the same embedding model to vectorize the question.
6. The question will be passed to Oracle Database 23ai and a similarity search is performed on the question.
7. The results (context) of the search and the prompt are passed to the LLM to generate the response.
Pre-requisite steps:
3. Loads the environment variables and connects to Oracle Database 23ai with the credentials and connection string. I have Oracle 23ai installed locally on my laptop.
RAG steps:
1. Load the document ( I downloaded oracle-database-23c-new-features-guide.pdf) in same directory to use it for this application.
2. Transform the document to text
3. Split the text into chunks
adds metadata such as id to each chunk for the database table.
4. Set up Oracle AI Vector Search and insert the embedding vectors - I am using OpenAI embedding here to embed the chunks as vectors into Oracle Database 23ai.
5. Build the prompt to query the document:
take user question:
Set up OpenAI LLM to generate your response I used gpt-3.5-turbo model, you can use any LLM model.
Builds the prompt template to include both the question and the context, and instantiates the knowledge base class to use the retriever to retrieve context from Oracle Database 23ai.
6. and 7. Last 2 steps is basically Invoke the chain.
This is the key part of the RAG application. It is the LangChain pipeline that chains all the components together to produce an LLM response with context.
The chain will embed the question as a vector. This vector will be used to search for other vectors that are similar. The top similar vectors will be returned as text chunks (context).
Together the question and the context will form the prompt to the LLM for processing. And ultimately generating the response. See below code.
The code above sets up a processing pipeline where user_question is processed sequentially by retriever, prompt, llm, and StrOutputParser, with each step performing some transformation or analysis on the input. The final result is stored in the variable response.
The steps inside chain:
1. {"context": retriever, "question": RunnablePassthrough()}:
This is the first step in the chain. It involves a dictionary with keys "context" and "question" mapped to some objects or functions named retriever and RunnablePassthrough() respectively.
2. | prompt:
The | operator is used to chain the output of the previous step with the prompt object or function. This suggests that the output of the first step will be passed as input to prompt.
3. | llm:
Similarly, the output of the previous step is passed as input to llm.
4. | StrOutputParser():
Finally, the output of llmOCI is passed through StrOutputParser.
5. response = chain.invoke(userquestion):
This line invokes the entire chain with the userquestion as input and assigns the output to the variable response.
No comments:
Post a Comment