Rag Retrieval Augmented Generation
If you want to use LLMs with your own data, you can do it using RAG. You'll have to create a RAG Pipeline...
Steps in RAG Pipeline
RAG Pipeline.excalidraw
Extract data from the documents
This can be easier for text/markdown documents. But more complex for PDF, scanned documents, images, etc.
Chunking in RAG
Split the documents in smaller chunks according to some metric. This can be character length, paragraphs, semantics, etc.
Create Embeddings
Convert each chunk into an vector - these are called embeddings.
Store Embeddings in a Vector Database
User Query
When a query is made, an embedding of the query is matched against the vector DB and the top resulting chunks are returned.
Prompt Generation
The system will make a prompt with both the query and the top chunks. Something along the lines of...
Using this context...
[Chunks return in the vector db search]
Answer this question...
[User Query]
LLM Response
We then send the entire prompt to an LLM - and fetch the response. Depending on the chuck size and number of chunks that we include in the prompt, this can be big. Some optimization will be required to keep the token size low.
Advanced Usage
You can have multiple pipelines in play - for different use cases or even different types of questions. Just have a router to execute the right pipeline when you get the question.
If RAG is fitting your use case, the next option you have is fine-tuning the model for your use case.