RAG
The RAG pipeline (aka Extractor) joins a prompt, context data store and generative model together to extract knowledge.
The data store can be an embeddings database or a similarity instance with associated input text. The generative model can be a prompt-driven large language model (LLM), an extractive question-answering model or a custom pipeline. This is known as retrieval augmented generation (RAG).
Example
The following shows a simple example using this pipeline.
from txtai import Embeddings, RAG
# Input data
data = [
"US tops 5 million confirmed virus cases",
"Canada's last fully intact ice shelf has suddenly collapsed, " +
"forming a Manhattan-sized iceberg",
"Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
"The National Park Service warns against sacrificing slower friends " +
"in a bear attack",
"Maine man wins $1M from $25 lottery ticket",
"Make huge profits without work, earn up to $100,000 a day"
]
# Build embeddings index
embeddings = Embeddings(content=True)
embeddings.index(data)
# Create and run pipeline
rag = RAG(embeddings, "google/flan-t5-base", template="""
Answer the following question using the provided context.
Question:
{question}
Context:
{context}
""")
rag("What was won?")
See the Embeddings and LLM pages for additional configuration options.
See the links below for more detailed examples.
Notebook | Description | |
---|---|---|
Prompt-driven search with LLMs | Embeddings-guided and Prompt-driven search with Large Language Models (LLMs) | |
Prompt templates and task chains | Build model prompts and connect tasks together with workflows | |
Build RAG pipelines with txtai | Guide on retrieval augmented generation including how to create citations | |
Integrate LLM frameworks | Integrate llama.cpp, LiteLLM and custom generation frameworks | |
Generate knowledge with Semantic Graphs and RAG | Knowledge exploration and discovery with Semantic Graphs and RAG | |
Build knowledge graphs with LLMs | Build knowledge graphs with LLM-driven entity extraction | |
Advanced RAG with graph path traversal | Graph path traversal to collect complex sets of data for advanced RAG | |
Advanced RAG with guided generation | Retrieval Augmented and Guided Generation | |
RAG with llama.cpp and external API services | RAG with additional vector and LLM frameworks | |
How RAG with txtai works | Create RAG processes, API services and Docker instances | |
Speech to Speech RAG ▶️ | Full cycle speech to speech workflow with RAG | |
Generative Audio | Storytelling with generative audio workflows | |
Extractive QA with txtai | Introduction to extractive question-answering with txtai | |
Extractive QA with Elasticsearch | Run extractive question-answering queries with Elasticsearch | |
Extractive QA to build structured data | Build structured datasets using extractive question-answering |
Configuration-driven example
Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.
config.yml
# Allow documents to be indexed
writable: True
# Content is required for extractor pipeline
embeddings:
content: True
rag:
path: google/flan-t5-base
template: |
Answer the following question using the provided context.
Question:
{question}
Context:
{context}
workflow:
search:
tasks:
- action: rag
Run with Workflows
Built in tasks make using the extractor pipeline easier.
from txtai import Application
# Create and run pipeline with workflow
app = Application("config.yml")
app.add([
"US tops 5 million confirmed virus cases",
"Canada's last fully intact ice shelf has suddenly collapsed, " +
"forming a Manhattan-sized iceberg",
"Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
"The National Park Service warns against sacrificing slower friends " +
"in a bear attack",
"Maine man wins $1M from $25 lottery ticket",
"Make huge profits without work, earn up to $100,000 a day"
])
app.index()
list(app.workflow("search", ["What was won?"]))
Run with API
CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
-X POST "http://localhost:8000/workflow" \
-H "Content-Type: application/json" \
-d '{"name": "search", "elements": ["What was won"]}'
Methods
Python documentation for the pipeline.
__init__(similarity, path, quantize=False, gpu=True, model=None, tokenizer=None, minscore=None, mintokens=None, context=None, task=None, output='default', template=None, separator=' ', system=None, **kwargs)
Builds a new RAG pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
similarity | similarity instance (embeddings or similarity pipeline) | required | |
path | path to model, supports a LLM, Questions or custom pipeline | required | |
quantize | True if model should be quantized before inference, False otherwise. | False | |
gpu | if gpu inference should be used (only works if GPUs are available) | True | |
model | optional existing pipeline model to wrap | None | |
tokenizer | Tokenizer class | None | |
minscore | minimum score to include context match, defaults to None | None | |
mintokens | minimum number of tokens to include context match, defaults to None | None | |
context | topn context matches to include, defaults to 3 | None | |
task | model task (language-generation, sequence-sequence or question-answering), defaults to auto-detect | None | |
output | output format, ‘default’ returns (name, answer), ‘flatten’ returns answers and ‘reference’ returns (name, answer, reference) | ‘default’ | |
template | prompt template, it must have a parameter for {question} and {context}, defaults to “{question} {context}” | None | |
separator | context separator | ‘ ‘ | |
system | system prompt, defaults to None | None | |
kwargs | additional keyword arguments to pass to pipeline model | {} |
Source code in txtai/pipeline/llm/rag.py
|
|
__call__(queue, texts=None, **kwargs)
Finds answers to input questions. This method runs queries to find the top n best matches and uses that as the context. A model is then run against the context for each input question, with the answer returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queue | input question queue (name, query, question, snippet), can be list of tuples/dicts/strings or a single input element | required | |
texts | optional list of text for context, otherwise runs embeddings search | None | |
kwargs | additional keyword arguments to pass to pipeline model | {} |
Returns:
Type | Description |
---|---|
list of answers matching input format (tuple or dict) containing fields as specified by output format |
Source code in txtai/pipeline/llm/rag.py
|
|