Translation

pipeline pipeline

The Translation pipeline translates text between languages. It supports over 100+ languages. Automatic source language detection is built-in. This pipeline detects the language of each input text row, loads a model for the source-target combination and translates text to the target language.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Translation
  2. # Create and run pipeline
  3. translate = Translation()
  4. translate("This is a test translation into Spanish", "es")

See the link below for a more detailed example.

NotebookDescription
Translate text between languagesStreamline machine translation and language detectionOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. translation:
  3. # Run pipeline with workflow
  4. workflow:
  5. translate:
  6. tasks:
  7. - action: translation
  8. args: ["es"]

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("translate", ["This is a test translation into Spanish"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"translate", "elements":["This is a test translation into Spanish"]}'

Methods

Python documentation for the pipeline.

__init__(path=None, quantize=False, gpu=True, batch=64, langdetect=None, findmodels=True)

Constructs a new language translation pipeline.

Parameters:

NameTypeDescriptionDefault
path

optional path to model, accepts Hugging Face model hub id or local path, uses default model for task if not provided

None
quantize

if model should be quantized, defaults to False

False
gpu

True/False if GPU should be enabled, also supports a GPU device id

True
batch

batch size used to incrementally process content

64
langdetect

set a custom language detection function, method must take a list of strings and return language codes for each, uses default language detector if not provided

None
findmodels

True/False if the Hugging Face Hub will be searched for source-target translation models

True

Source code in txtai/pipeline/text/translation.py

  1. 29
  2. 30
  3. 31
  4. 32
  5. 33
  6. 34
  7. 35
  8. 36
  9. 37
  10. 38
  11. 39
  12. 40
  13. 41
  14. 42
  15. 43
  16. 44
  17. 45
  18. 46
  19. 47
  20. 48
  21. 49
  22. 50
  23. 51
  24. 52
  25. 53
  26. 54
  1. def init(self, path=None, quantize=False, gpu=True, batch=64, langdetect=None, findmodels=True):
  2. “””
  3. Constructs a new language translation pipeline.
  4. Args:
  5. path: optional path to model, accepts Hugging Face model hub id or local path,
  6. uses default model for task if not provided
  7. quantize: if model should be quantized, defaults to False
  8. gpu: True/False if GPU should be enabled, also supports a GPU device id
  9. batch: batch size used to incrementally process content
  10. langdetect: set a custom language detection function, method must take a list of strings and return
  11. language codes for each, uses default language detector if not provided
  12. findmodels: True/False if the Hugging Face Hub will be searched for source-target translation models
  13. “””
  14. # Call parent constructor
  15. super().init(path if path else facebook/m2m100_418M”, quantize, gpu, batch)
  16. # Language detection
  17. self.detector = None
  18. self.langdetect = langdetect
  19. self.findmodels = findmodels
  20. # Language models
  21. self.models = {}
  22. self.ids = self.modelids()

__call__(texts, target='en', source=None, showmodels=False)

Translates text from source language into target language.

This method supports texts as a string or a list. If the input is a string, the return type is string. If text is a list, the return type is a list.

Parameters:

NameTypeDescriptionDefault
texts

text|list

required
target

target language code, defaults to “en”

‘en’
source

source language code, detects language if not provided

None

Returns:

TypeDescription

list of translated text

Source code in txtai/pipeline/text/translation.py

  1. 56
  2. 57
  3. 58
  4. 59
  5. 60
  6. 61
  7. 62
  8. 63
  9. 64
  10. 65
  11. 66
  12. 67
  13. 68
  14. 69
  15. 70
  16. 71
  17. 72
  18. 73
  19. 74
  20. 75
  21. 76
  22. 77
  23. 78
  24. 79
  25. 80
  26. 81
  27. 82
  28. 83
  29. 84
  30. 85
  31. 86
  32. 87
  33. 88
  34. 89
  35. 90
  36. 91
  37. 92
  38. 93
  39. 94
  40. 95
  41. 96
  42. 97
  43. 98
  44. 99
  45. 100
  46. 101
  47. 102
  48. 103
  49. 104
  50. 105
  1. def call(self, texts, target=”en”, source=None, showmodels=False):
  2. “””
  3. Translates text from source language into target language.
  4. This method supports texts as a string or a list. If the input is a string,
  5. the return type is string. If text is a list, the return type is a list.
  6. Args:
  7. texts: text|list
  8. target: target language code, defaults to en
  9. source: source language code, detects language if not provided
  10. Returns:
  11. list of translated text
  12. “””
  13. values = [texts] if not isinstance(texts, list) else texts
  14. # Detect source languages
  15. languages = self.detect(values) if not source else [source] * len(values)
  16. unique = set(languages)
  17. # Build a dict from language to list of (index, text)
  18. langdict = {}
  19. for x, lang in enumerate(languages):
  20. if lang not in langdict:
  21. langdict[lang] = []
  22. langdict[lang].append((x, values[x]))
  23. results = {}
  24. for language in unique:
  25. # Get all indices and text values for a language
  26. inputs = langdict[language]
  27. # Translate text in batches
  28. outputs = []
  29. for chunk in self.batch([text for , text in inputs], self.batchsize):
  30. outputs.extend(self.translate(chunk, language, target, showmodels))
  31. # Store output value
  32. for y, (x, ) in enumerate(inputs):
  33. if showmodels:
  34. model, op = outputs[y]
  35. results[x] = (op.strip(), language, model)
  36. else:
  37. results[x] = outputs[y].strip()
  38. # Return results in same order as input
  39. results = [results[x] for x in sorted(results)]
  40. return results[0] if isinstance(texts, str) else results