The Entity pipeline applies a token classifier to text and extracts entity/label combinations.


The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Entity
  2. # Create and run pipeline
  3. entity = Entity()
  4. entity("Canada's last fully intact ice shelf has suddenly collapsed, " \
  5. "forming a Manhattan-sized iceberg")

See the link below for a more detailed example.

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.


  1. # Create pipeline using lower case class name
  2. entity:
  3. # Run pipeline with workflow
  4. workflow:
  5. entity:
  6. tasks:
  7. - action: entity

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("entity", ["Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"entity", "elements": ["Canadas last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg"]}'


Python documentation for the pipeline.

__init__(path=None, quantize=False, gpu=True, model=None, **kwargs)

Source code in txtai/pipeline/text/entity.py

  1. def init(self, path=None, quantize=False, gpu=True, model=None, kwargs):
  2. super().init(“token-classification”, path, quantize, gpu, model, kwargs)

__call__(text, labels=None, aggregate='simple', flatten=None, join=False, workers=0)

Applies a token classifier to text and extracts entity/label combinations.





list of entity type labels to accept, defaults to None which accepts all


method to combine multi token entities - options are “simple” (default), “first”, “average” or “max”


flatten output to a list of labels if present. Accepts a boolean or float value to only keep scores greater than that number.


joins flattened output into a string if True, ignored if flatten not set


number of concurrent workers to use for processing data, defaults to None




list of (entity, entity type, score) or list of entities depending on flatten parameter

Source code in txtai/pipeline/text/entity.py

  1. def call(self, text, labels=None, aggregate=”simple”, flatten=None, join=False, workers=0):
  2. “””
  3. Applies a token classifier to text and extracts entity/label combinations.
  4. Args:
  5. text: text|list
  6. labels: list of entity type labels to accept, defaults to None which accepts all
  7. aggregate: method to combine multi token entities - options are simple (default), first”, average or max
  8. flatten: flatten output to a list of labels if present. Accepts a boolean or float value to only keep scores greater than that number.
  9. join: joins flattened output into a string if True, ignored if flatten not set
  10. workers: number of concurrent workers to use for processing data, defaults to None
  11. Returns:
  12. list of (entity, entity type, score) or list of entities depending on flatten parameter
  13. “””
  14. # Run token classification pipeline
  15. results = self.pipeline(text, aggregation_strategy=aggregate, num_workers=workers)
  16. # Convert results to a list if necessary
  17. if isinstance(text, str):
  18. results = [results]
  19. # Score threshold when flatten is set
  20. threshold = 0.0 if isinstance(flatten, bool) else flatten
  21. # Extract entities if flatten set, otherwise extract (entity, entity type, score) tuples
  22. outputs = []
  23. for result in results:
  24. if flatten:
  25. output = [r[“word”] for r in result if self.accept(r[“entity_group”], labels) and r[“score”] >= threshold]
  26. outputs.append(“ “.join(output) if join else output)
  27. else:
  28. outputs.append([(r[“word”], r[“entity_group”], float(r[“score”])) for r in result if self.accept(r[“entity_group”], labels)])
  29. return outputs[0] if isinstance(text, str) else outputs