RAG

pipeline pipeline

The RAG pipeline (aka Extractor) joins a prompt, context data store and generative model together to extract knowledge.

The data store can be an embeddings database or a similarity instance with associated input text. The generative model can be a prompt-driven large language model (LLM), an extractive question-answering model or a custom pipeline. This is known as retrieval augmented generation (RAG).

Example

The following shows a simple example using this pipeline.

  1. from txtai import Embeddings, RAG
  2. # Input data
  3. data = [
  4. "US tops 5 million confirmed virus cases",
  5. "Canada's last fully intact ice shelf has suddenly collapsed, " +
  6. "forming a Manhattan-sized iceberg",
  7. "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  8. "The National Park Service warns against sacrificing slower friends " +
  9. "in a bear attack",
  10. "Maine man wins $1M from $25 lottery ticket",
  11. "Make huge profits without work, earn up to $100,000 a day"
  12. ]
  13. # Build embeddings index
  14. embeddings = Embeddings(content=True)
  15. embeddings.index(data)
  16. # Create and run pipeline
  17. rag = RAG(embeddings, "google/flan-t5-base", template="""
  18. Answer the following question using the provided context.
  19. Question:
  20. {question}
  21. Context:
  22. {context}
  23. """)
  24. rag("What was won?")

See the Embeddings and LLM pages for additional configuration options.

See the links below for more detailed examples.

NotebookDescription
Prompt-driven search with LLMsEmbeddings-guided and Prompt-driven search with Large Language Models (LLMs)Open In Colab
Prompt templates and task chainsBuild model prompts and connect tasks together with workflowsOpen In Colab
Build RAG pipelines with txtaiGuide on retrieval augmented generation including how to create citationsOpen In Colab
Integrate LLM frameworksIntegrate llama.cpp, LiteLLM and custom generation frameworksOpen In Colab
Generate knowledge with Semantic Graphs and RAGKnowledge exploration and discovery with Semantic Graphs and RAGOpen In Colab
Build knowledge graphs with LLMsBuild knowledge graphs with LLM-driven entity extractionOpen In Colab
Advanced RAG with graph path traversalGraph path traversal to collect complex sets of data for advanced RAGOpen In Colab
Advanced RAG with guided generationRetrieval Augmented and Guided GenerationOpen In Colab
RAG with llama.cpp and external API servicesRAG with additional vector and LLM frameworksOpen In Colab
How RAG with txtai worksCreate RAG processes, API services and Docker instancesOpen In Colab
Speech to Speech RAG ▶️Full cycle speech to speech workflow with RAGOpen In Colab
Generative AudioStorytelling with generative audio workflowsOpen In Colab
Extractive QA with txtaiIntroduction to extractive question-answering with txtaiOpen In Colab
Extractive QA with ElasticsearchRun extractive question-answering queries with ElasticsearchOpen In Colab
Extractive QA to build structured dataBuild structured datasets using extractive question-answeringOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Allow documents to be indexed
  2. writable: True
  3. # Content is required for extractor pipeline
  4. embeddings:
  5. content: True
  6. rag:
  7. path: google/flan-t5-base
  8. template: |
  9. Answer the following question using the provided context.
  10. Question:
  11. {question}
  12. Context:
  13. {context}
  14. workflow:
  15. search:
  16. tasks:
  17. - action: rag

Run with Workflows

Built in tasks make using the extractor pipeline easier.

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. app.add([
  5. "US tops 5 million confirmed virus cases",
  6. "Canada's last fully intact ice shelf has suddenly collapsed, " +
  7. "forming a Manhattan-sized iceberg",
  8. "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  9. "The National Park Service warns against sacrificing slower friends " +
  10. "in a bear attack",
  11. "Maine man wins $1M from $25 lottery ticket",
  12. "Make huge profits without work, earn up to $100,000 a day"
  13. ])
  14. app.index()
  15. list(app.workflow("search", ["What was won?"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name": "search", "elements": ["What was won"]}'

Methods

Python documentation for the pipeline.

__init__(similarity, path, quantize=False, gpu=True, model=None, tokenizer=None, minscore=None, mintokens=None, context=None, task=None, output='default', template=None, separator=' ', system=None, **kwargs)

Builds a new RAG pipeline.

Parameters:

NameTypeDescriptionDefault
similarity

similarity instance (embeddings or similarity pipeline)

required
path

path to model, supports a LLM, Questions or custom pipeline

required
quantize

True if model should be quantized before inference, False otherwise.

False
gpu

if gpu inference should be used (only works if GPUs are available)

True
model

optional existing pipeline model to wrap

None
tokenizer

Tokenizer class

None
minscore

minimum score to include context match, defaults to None

None
mintokens

minimum number of tokens to include context match, defaults to None

None
context

topn context matches to include, defaults to 3

None
task

model task (language-generation, sequence-sequence or question-answering), defaults to auto-detect

None
output

output format, ‘default’ returns (name, answer), ‘flatten’ returns answers and ‘reference’ returns (name, answer, reference)

‘default’
template

prompt template, it must have a parameter for {question} and {context}, defaults to “{question} {context}”

None
separator

context separator

‘ ‘
system

system prompt, defaults to None

None
kwargs

additional keyword arguments to pass to pipeline model

{}

Source code in txtai/pipeline/llm/rag.py

  1. 24
  2. 25
  3. 26
  4. 27
  5. 28
  6. 29
  7. 30
  8. 31
  9. 32
  10. 33
  11. 34
  12. 35
  13. 36
  14. 37
  15. 38
  16. 39
  17. 40
  18. 41
  19. 42
  20. 43
  21. 44
  22. 45
  23. 46
  24. 47
  25. 48
  26. 49
  27. 50
  28. 51
  29. 52
  30. 53
  31. 54
  32. 55
  33. 56
  34. 57
  35. 58
  36. 59
  37. 60
  38. 61
  39. 62
  40. 63
  41. 64
  42. 65
  43. 66
  44. 67
  45. 68
  46. 69
  47. 70
  48. 71
  49. 72
  50. 73
  51. 74
  52. 75
  53. 76
  54. 77
  55. 78
  56. 79
  57. 80
  58. 81
  59. 82
  60. 83
  61. 84
  62. 85
  63. 86
  64. 87
  65. 88
  66. 89
  67. 90
  68. 91
  1. def init(
  2. self,
  3. similarity,
  4. path,
  5. quantize=False,
  6. gpu=True,
  7. model=None,
  8. tokenizer=None,
  9. minscore=None,
  10. mintokens=None,
  11. context=None,
  12. task=None,
  13. output=”default”,
  14. template=None,
  15. separator=” “,
  16. system=None,
  17. kwargs,
  18. ):
  19. “””
  20. Builds a new RAG pipeline.
  21. Args:
  22. similarity: similarity instance (embeddings or similarity pipeline)
  23. path: path to model, supports a LLM, Questions or custom pipeline
  24. quantize: True if model should be quantized before inference, False otherwise.
  25. gpu: if gpu inference should be used (only works if GPUs are available)
  26. model: optional existing pipeline model to wrap
  27. tokenizer: Tokenizer class
  28. minscore: minimum score to include context match, defaults to None
  29. mintokens: minimum number of tokens to include context match, defaults to None
  30. context: topn context matches to include, defaults to 3
  31. task: model task (language-generation, sequence-sequence or question-answering), defaults to auto-detect
  32. output: output format, default returns (name, answer), flatten returns answers and reference returns (name, answer, reference)
  33. template: prompt template, it must have a parameter for {question} and {context}, defaults to “{question} {context}”
  34. separator: context separator
  35. system: system prompt, defaults to None
  36. kwargs: additional keyword arguments to pass to pipeline model
  37. “””
  38. # Similarity instance
  39. self.similarity = similarity
  40. # Model can be a LLM, Questions or custom pipeline
  41. self.model = self.load(path, quantize, gpu, model, task, kwargs)
  42. # Tokenizer class use default method if not set
  43. self.tokenizer = tokenizer if tokenizer else Tokenizer() if hasattr(self.similarity, scoring”) and self.similarity.isweighted() else None
  44. # Minimum score to include context match
  45. self.minscore = minscore if minscore is not None else 0.0
  46. # Minimum number of tokens to include context match
  47. self.mintokens = mintokens if mintokens is not None else 0.0
  48. # Top n context matches to include for context
  49. self.context = context if context else 3
  50. # Output format
  51. self.output = output
  52. # Prompt template
  53. self.template = template if template else “{question} {context}”
  54. # Context separator
  55. self.separator = separator
  56. # System prompt template
  57. self.system = system

__call__(queue, texts=None, **kwargs)

Finds answers to input questions. This method runs queries to find the top n best matches and uses that as the context. A model is then run against the context for each input question, with the answer returned.

Parameters:

NameTypeDescriptionDefault
queue

input question queue (name, query, question, snippet), can be list of tuples/dicts/strings or a single input element

required
texts

optional list of text for context, otherwise runs embeddings search

None
kwargs

additional keyword arguments to pass to pipeline model

{}

Returns:

TypeDescription

list of answers matching input format (tuple or dict) containing fields as specified by output format

Source code in txtai/pipeline/llm/rag.py

  1. 93
  2. 94
  3. 95
  4. 96
  5. 97
  6. 98
  7. 99
  8. 100
  9. 101
  10. 102
  11. 103
  12. 104
  13. 105
  14. 106
  15. 107
  16. 108
  17. 109
  18. 110
  19. 111
  20. 112
  21. 113
  22. 114
  23. 115
  24. 116
  25. 117
  26. 118
  27. 119
  28. 120
  29. 121
  30. 122
  31. 123
  32. 124
  33. 125
  34. 126
  35. 127
  36. 128
  37. 129
  38. 130
  39. 131
  40. 132
  41. 133
  42. 134
  43. 135
  44. 136
  45. 137
  46. 138
  47. 139
  48. 140
  49. 141
  50. 142
  51. 143
  52. 144
  53. 145
  1. def call(self, queue, texts=None, kwargs):
  2. “””
  3. Finds answers to input questions. This method runs queries to find the top n best matches and uses that as the context.
  4. A model is then run against the context for each input question, with the answer returned.
  5. Args:
  6. queue: input question queue (name, query, question, snippet), can be list of tuples/dicts/strings or a single input element
  7. texts: optional list of text for context, otherwise runs embeddings search
  8. kwargs: additional keyword arguments to pass to pipeline model
  9. Returns:
  10. list of answers matching input format (tuple or dict) containing fields as specified by output format
  11. “””
  12. # Save original queue format
  13. inputs = queue
  14. # Convert queue to list, if necessary
  15. queue = queue if isinstance(queue, list) else [queue]
  16. # Convert dictionary inputs to tuples
  17. if queue and isinstance(queue[0], dict):
  18. # Convert dict to tuple
  19. queue = [tuple(row.get(x) for x in [“name”, query”, question”, snippet”]) for row in queue]
  20. if queue and isinstance(queue[0], str):
  21. # Convert string questions to tuple
  22. queue = [(None, row, row, None) for row in queue]
  23. # Rank texts by similarity for each query
  24. results = self.query([query for , query, , in queue], texts)
  25. # Build question-context pairs
  26. names, queries, questions, contexts, topns, snippets = [], [], [], [], [], []
  27. for x, (name, query, question, snippet) in enumerate(queue):
  28. # Get top n best matching segments
  29. topn = sorted(results[x], key=lambda y: y[2], reverse=True)[: self.context]
  30. # Generate context using ordering from texts, if available, otherwise order by score
  31. context = self.separator.join(text for , text, _ in (sorted(topn, key=lambda y: y[0]) if texts else topn))
  32. names.append(name)
  33. queries.append(query)
  34. questions.append(question)
  35. contexts.append(context)
  36. topns.append(topn)
  37. snippets.append(snippet)
  38. # Run pipeline and return answers
  39. answers = self.answers(questions, contexts, kwargs)
  40. # Apply output formatting to answers and return
  41. return self.apply(inputs, names, queries, answers, topns, snippets) if isinstance(answers, list) else answers