LLM

The LLM pipeline runs prompts through a large language model (LLM). This pipeline autodetects if the model path is a text generation or sequence to sequence model.

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import LLM
# Create and run LLM pipeline
llm = LLM()
llm(
  """
  Answer the following question using the provided context.
  Question:
  What are the applications of txtai?
  Context:
  txtai is an open-source platform for semantic search and
  workflows powered by language models.
  """
)

The LLM pipeline automatically detects the underlying model type (text-generation or sequence-sequence). This can also be manually set.

from txtai.pipeline import LLM, Generator, Sequences
# Set model type via task parameter
llm = LLM("google/flan-t5-xl", task="sequence-sequence")
# Create sequences pipeline (same as previous statement)
sequences = Sequences("google/flan-t5-xl")
# Set model type via task parameter
llm = LLM("openlm-research/open_llama_3b", task="language-generation")
# Create generator pipeline (same as previous statement)
generator = Generator("openlm-research/open_llama_3b")

Models can be externally loaded and passed to pipelines. This is useful for models that are not yet supported by Transformers and/or need special initialization.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from txtai.pipeline import LLM
# Load Falcon-7B-Instruct
path = "tiiuae/falcon-7b-instruct"
model = AutoModelForCausalLM.from_pretrained(
  path,
  torch_dtype=torch.bfloat16,
  trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(path)
llm = LLM((model, tokenizer))

See the links below for more detailed examples.

Notebook	Description
Prompt-driven search with LLMs	Embeddings-guided and Prompt-driven search with Large Language Models (LLMs)
Prompt templates and task chains	Build model prompts and connect tasks together with workflows

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

# Create pipeline using lower case class name
# Use `generator` or `sequences` to force model type
llm:
# Run pipeline with workflow
workflow:
  llm:
    tasks:
      - action: llm

Similar to the Python example above, the underlying Hugging Face pipeline parameters and model parameters can be set in pipeline configuration.

llm:
  path: tiiuae/falcon-7b-instruct
  torch_dtype: torch.bfloat16
  trust_remote_code: True

Run with Workflows

from txtai.app import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("llm", [
  """
  Answer the following question using the provided context.
  Question:
  What are the applications of txtai? 
  Context:
  txtai is an open-source platform for semantic search and
  workflows powered by language models.
  """
]))

Run with API

CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"sequences", "elements": ["Answer the following question..."]}'

Methods

Python documentation for the pipeline.

`init(self, path=None, quantize=False, gpu=True, model=None, task=None, **kwargs)` `special`

Source code in txtai/pipeline/text/llm.py

def __init__(self, path=None, quantize=False, gpu=True, model=None, task=None, **kwargs):
    super().__init__(self.task(path, task, **kwargs), path if path else "google/flan-t5-base", quantize, gpu, model, **kwargs)
    # Load tokenizer, if necessary
    self.pipeline.tokenizer = self.pipeline.tokenizer if self.pipeline.tokenizer else Models.tokenizer(path, **kwargs)

`call(self, text, prefix=None, maxlength=512, workers=0, **kwargs)` `special`

Generates text using input text

Parameters:

Name	Description	Default
`text`	text\|list	required
`prefix`	optional prefix to prepend to text elements	`None`
`maxlength`	maximum sequence length	`512`
`workers`	number of concurrent workers to use for processing data, defaults to None	`0`
`kwargs`	additional generation keyword arguments	`{}`

Returns:

Type	Description
	generated text