The Translation pipeline translates text between languages. It supports over 100+ languages. Automatic source language detection is built-in. This pipeline detects the language of each input text row, loads a model for the source-target combination and translates text to the target language.


The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Translation
  2. # Create and run pipeline
  3. translate = Translation()
  4. translate("This is a test translation into Spanish", "es")

See the link below for a more detailed example.

Translate text between languagesStreamline machine translation and language detectionOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.


  1. # Create pipeline using lower case class name
  2. translation:
  3. # Run pipeline with workflow
  4. workflow:
  5. translate:
  6. tasks:
  7. - action: translation
  8. args: ["es"]

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("translate", ["This is a test translation into Spanish"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"translate", "elements":["This is a test translation into Spanish"]}'


Python documentation for the pipeline.

__init__(path=None, quantize=False, gpu=True, batch=64, langdetect=None, findmodels=True)

Constructs a new language translation pipeline.



optional path to model, accepts Hugging Face model hub id or local path, uses default model for task if not provided


if model should be quantized, defaults to False


True/False if GPU should be enabled, also supports a GPU device id


batch size used to incrementally process content


set a custom language detection function, method must take a list of strings and return language codes for each, uses default language detector if not provided


True/False if the Hugging Face Hub will be searched for source-target translation models


Source code in txtai/pipeline/text/

  1. def init(self, path=None, quantize=False, gpu=True, batch=64, langdetect=None, findmodels=True):
  2. “””
  3. Constructs a new language translation pipeline.
  4. Args:
  5. path: optional path to model, accepts Hugging Face model hub id or local path,
  6. uses default model for task if not provided
  7. quantize: if model should be quantized, defaults to False
  8. gpu: True/False if GPU should be enabled, also supports a GPU device id
  9. batch: batch size used to incrementally process content
  10. langdetect: set a custom language detection function, method must take a list of strings and return
  11. language codes for each, uses default language detector if not provided
  12. findmodels: True/False if the Hugging Face Hub will be searched for source-target translation models
  13. “””
  14. # Call parent constructor
  15. super().init(path if path else facebook/m2m100_418M”, quantize, gpu, batch)
  16. # Language detection
  17. self.detector = None
  18. self.langdetect = langdetect
  19. self.findmodels = findmodels
  20. # Language models
  21. self.models = {}
  22. self.ids = self.modelids()

__call__(texts, target='en', source=None, showmodels=False)

Translates text from source language into target language.

This method supports texts as a string or a list. If the input is a string, the return type is string. If text is a list, the return type is a list.





target language code, defaults to “en”


source language code, detects language if not provided




list of translated text

Source code in txtai/pipeline/text/

  2. “””
  3. Translates text from source language into target language.
  4. This method supports texts as a string or a list. If the input is a string,
  5. the return type is string. If text is a list, the return type is a list.
  6. Args:
  7. texts: text|list
  8. target: target language code, defaults to en
  9. source: source language code, detects language if not provided
  10. Returns:
  11. list of translated text
  12. “””
  13. values = [texts] if not isinstance(texts, list) else texts
  14. # Detect source languages
  15. languages = self.detect(values) if not source else [source] * len(values)
  16. unique = set(languages)
  17. # Build a dict from language to list of (index, text)
  18. langdict = {}
  19. for x, lang in enumerate(languages):
  20. if lang not in langdict:
  21. langdict[lang] = []
  22. langdict[lang].append((x, values[x]))
  23. results = {}
  24. for language in unique:
  25. # Get all indices and text values for a language
  26. inputs = langdict[language]
  27. # Translate text in batches
  28. outputs = []
  29. for chunk in self.batch([text for , text in inputs], self.batchsize):
  30. outputs.extend(self.translate(chunk, language, target, showmodels))
  31. # Store output value
  32. for y, (x, ) in enumerate(inputs):
  33. if showmodels:
  34. model, op = outputs[y]
  35. results[x] = (op.strip(), language, model)
  36. else:
  37. results[x] = outputs[y].strip()
  38. # Return results in same order as input
  39. results = [results[x] for x in sorted(results)]
  40. return results[0] if isinstance(texts, str) else results