LLM

pipeline pipeline

The LLM pipeline runs prompts through a large language model (LLM). This pipeline autodetects the LLM framework based on the model path.

Example

The following shows a simple example using this pipeline.

  1. from txtai import LLM
  2. # Create LLM pipeline
  3. llm = LLM()
  4. # Run prompt
  5. llm(
  6. """
  7. Answer the following question using the provided context.
  8. Question:
  9. What are the applications of txtai?
  10. Context:
  11. txtai is an open-source platform for semantic search and
  12. workflows powered by language models.
  13. """
  14. )
  15. # Chat messages are also supported
  16. llm([
  17. {"role": "system", "content": "You are a friendly assistant."},
  18. {"role": "user", "content": "Answer the following question..."}
  19. ])

The LLM pipeline automatically detects the underlying LLM framework. This can also be manually set.

See the LiteLLM documentation for the options available with LiteLLM models. llama.cpp models support both local and remote GGUF paths on the HF Hub.

  1. from txtai import LLM
  2. # Transformers
  3. llm = LLM("meta-llama/Meta-Llama-3.1-8B-Instruct")
  4. llm = LLM("meta-llama/Meta-Llama-3.1-8B-Instruct", method="transformers")
  5. # llama.cpp
  6. llm = LLM("microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-gguf")
  7. llm = LLM("microsoft/Phi-3-mini-4k-instruct-gguf/Phi-3-mini-4k-instruct-gguf",
  8. method="llama.cpp")
  9. # LiteLLM
  10. llm = LLM("ollama/llama3.1")
  11. llm = LLM("ollama/llama3.1", method="litellm")
  12. # Custom Ollama endpoint
  13. llm = LLM("ollama/llama3.1", api_base="http://localhost:11434")
  14. # Custom OpenAI-compatible endpoint
  15. llm = LLM("openai/llama3.1", api_base="http://localhost:4000")
  16. # LLM APIs - must also set API key via environment variable
  17. llm = LLM("gpt-4o")
  18. llm = LLM("claude-3-5-sonnet-20240620")

Models can be externally loaded and passed to pipelines. This is useful for models that are not yet supported by Transformers and/or need special initialization.

  1. import torch
  2. from transformers import AutoModelForCausalLM, AutoTokenizer
  3. from txtai import LLM
  4. # Load Phi 3.5-mini
  5. path = "microsoft/Phi-3.5-mini-instruct"
  6. model = AutoModelForCausalLM.from_pretrained(
  7. path,
  8. torch_dtype=torch.bfloat16,
  9. )
  10. tokenizer = AutoTokenizer.from_pretrained(path)
  11. llm = LLM((model, tokenizer))

See the links below for more detailed examples.

NotebookDescription
Prompt-driven search with LLMsEmbeddings-guided and Prompt-driven search with Large Language Models (LLMs)Open In Colab
Prompt templates and task chainsBuild model prompts and connect tasks together with workflowsOpen In Colab
Build RAG pipelines with txtaiGuide on retrieval augmented generation including how to create citationsOpen In Colab
Integrate LLM frameworksIntegrate llama.cpp, LiteLLM and custom generation frameworksOpen In Colab
Generate knowledge with Semantic Graphs and RAGKnowledge exploration and discovery with Semantic Graphs and RAGOpen In Colab
Build knowledge graphs with LLMsBuild knowledge graphs with LLM-driven entity extractionOpen In Colab
Advanced RAG with graph path traversalGraph path traversal to collect complex sets of data for advanced RAGOpen In Colab
Advanced RAG with guided generationRetrieval Augmented and Guided GenerationOpen In Colab
RAG with llama.cpp and external API servicesRAG with additional vector and LLM frameworksOpen In Colab
How RAG with txtai worksCreate RAG processes, API services and Docker instancesOpen In Colab
Speech to Speech RAG ▶️Full cycle speech to speech workflow with RAGOpen In Colab
Generative AudioStorytelling with generative audio workflowsOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. llm:
  3. # Run pipeline with workflow
  4. workflow:
  5. llm:
  6. tasks:
  7. - action: llm

Similar to the Python example above, the underlying Hugging Face pipeline parameters and model parameters can be set in pipeline configuration.

  1. llm:
  2. path: microsoft/Phi-3.5-mini-instruct
  3. torch_dtype: torch.bfloat16

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("llm", [
  5. """
  6. Answer the following question using the provided context.
  7. Question:
  8. What are the applications of txtai?
  9. Context:
  10. txtai is an open-source platform for semantic search and
  11. workflows powered by language models.
  12. """
  13. ]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"llm", "elements": ["Answer the following question..."]}'

Methods

Python documentation for the pipeline.

__init__(path=None, method=None, **kwargs)

Creates a new LLM.

Parameters:

NameTypeDescriptionDefault
path

model path

None
method

llm model framework, infers from path if not provided

None
kwargs

model keyword arguments

{}

Source code in txtai/pipeline/llm/llm.py

  1. 25
  2. 26
  3. 27
  4. 28
  5. 29
  6. 30
  7. 31
  8. 32
  9. 33
  10. 34
  11. 35
  12. 36
  13. 37
  14. 38
  15. 39
  1. def init(self, path=None, method=None, kwargs):
  2. “””
  3. Creates a new LLM.
  4. Args:
  5. path: model path
  6. method: llm model framework, infers from path if not provided
  7. kwargs: model keyword arguments
  8. “””
  9. # Default LLM if not provided
  10. path = path if path else google/flan-t5-base
  11. # Generation instance
  12. self.generator = GenerationFactory.create(path, method, kwargs)

__call__(text, maxlength=512, stream=False, **kwargs)

Generates text. Supports the following input formats:

  • String or list of strings
  • List of dictionaries with role and content key-values or lists of lists

Parameters:

NameTypeDescriptionDefault
text

text|list

required
maxlength

maximum sequence length

512
stream

stream response if True, defaults to False

False
kwargs

additional generation keyword arguments

{}

Returns:

TypeDescription

generated text

Source code in txtai/pipeline/llm/llm.py

  1. 41
  2. 42
  3. 43
  4. 44
  5. 45
  6. 46
  7. 47
  8. 48
  9. 49
  10. 50
  11. 51
  12. 52
  13. 53
  14. 54
  15. 55
  16. 56
  17. 57
  18. 58
  19. 59
  20. 60
  21. 61
  22. 62
  1. def call(self, text, maxlength=512, stream=False, kwargs):
  2. “””
  3. Generates text. Supports the following input formats:
  4. - String or list of strings
  5. - List of dictionaries with role and content key-values or lists of lists
  6. Args:
  7. text: text|list
  8. maxlength: maximum sequence length
  9. stream: stream response if True, defaults to False
  10. kwargs: additional generation keyword arguments
  11. Returns:
  12. generated text
  13. “””
  14. # Debug logging
  15. logger.debug(text)
  16. # Run LLM generation
  17. return self.generator(text, maxlength, stream, kwargs)