The Tabular pipeline splits tabular data into rows and columns. The tabular pipeline is most useful in creating (id, text, tag) tuples to load into Embedding indexes.


The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Tabular
  2. # Create and run pipeline
  3. tabular = Tabular("id", ["text"])
  4. tabular("path to csv file")

See the link below for a more detailed example.

Transform tabular data with composable workflows

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.


  1. # Create pipeline using lower case class name
  2. tabular:
  3. idcolumn: id
  4. textcolumns:
  5. - text
  6. # Run pipeline with workflow
  7. workflow:
  8. tabular:
  9. tasks:
  10. - action: tabular

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("tabular", ["path to csv file"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"tabular", "elements":["path to csv file"]}'


Python documentation for the pipeline.

__init__(idcolumn=None, textcolumns=None, content=False)

Creates a new Tabular pipeline.



column name to use for row id


list of columns to combine as a text field


if True, a dict per row is generated with all fields. If content is a list, a subset of fields is included in the generated rows.


Source code in txtai/pipeline/data/

  1. 23
  2. 24
  3. 25
  4. 26
  5. 27
  6. 28
  7. 29
  8. 30
  9. 31
  10. 32
  11. 33
  12. 34
  13. 35
  14. 36
  15. 37
  16. 38
  17. 39
  1. def init(self, idcolumn=None, textcolumns=None, content=False):
  2. “””
  3. Creates a new Tabular pipeline.
  4. Args:
  5. idcolumn: column name to use for row id
  6. textcolumns: list of columns to combine as a text field
  7. content: if True, a dict per row is generated with all fields. If content is a list, a subset of fields
  8. is included in the generated rows.
  9. “””
  10. if not PANDAS:
  11. raise ImportError(‘Tabular pipeline is not available - install pipeline extra to enable’)
  12. self.idcolumn = idcolumn
  13. self.textcolumns = textcolumns
  14. self.content = content


Splits data into rows and columns.



input data




list of (id, text, tag)

Source code in txtai/pipeline/data/

  1. 41
  2. 42
  3. 43
  4. 44
  5. 45
  6. 46
  7. 47
  8. 48
  9. 49
  10. 50
  11. 51
  12. 52
  13. 53
  14. 54
  15. 55
  16. 56
  17. 57
  18. 58
  19. 59
  20. 60
  21. 61
  22. 62
  23. 63
  24. 64
  25. 65
  26. 66
  27. 67
  28. 68
  29. 69
  30. 70
  31. 71
  32. 72
  33. 73
  34. 74
  35. 75
  36. 76
  37. 77
  38. 78
  39. 79
  40. 80
  41. 81
  42. 82
  1. def call(self, data):
  2. “””
  3. Splits data into rows and columns.
  4. Args:
  5. data: input data
  6. Returns:
  7. list of (id, text, tag)
  8. “””
  9. items = [data] if not isinstance(data, list) else data
  10. # Combine all rows into single return element
  11. results = []
  12. dicts = []
  13. for item in items:
  14. # File path
  15. if isinstance(item, str):
  16. _, extension = os.path.splitext(item)
  17. extension = extension.replace(“.”, “”).lower()
  18. if extension == csv”:
  19. df = pd.read_csv(item)
  20. results.append(self.process(df))
  21. # Dict
  22. if isinstance(item, dict):
  23. dicts.append(item)
  24. # List of dicts
  25. elif isinstance(item, list):
  26. df = pd.DataFrame(item)
  27. results.append(self.process(df))
  28. if dicts:
  29. df = pd.DataFrame(dicts)
  30. results.extend(self.process(df))
  31. return results[0] if not isinstance(data, list) else results