
pipeline pipeline

Exports a Hugging Face Transformer model to ONNX. Currently, this works best with classification/pooling/qa models. Work is ongoing for sequence to sequence models (summarization, transcription, translation).


The following shows a simple example using this pipeline.

  1. from txtai.pipeline import HFOnnx, Labels
  2. # Model path
  3. path = "distilbert-base-uncased-finetuned-sst-2-english"
  4. # Export model to ONNX
  5. onnx = HFOnnx()
  6. model = onnx(path, "text-classification", "model.onnx", True)
  7. # Run inference and validate
  8. labels = Labels((model, path), dynamic=False)
  9. labels("I am happy")

See the link below for a more detailed example.

Export and run models with ONNXExport models with ONNX, run natively in JavaScript, Java and RustOpen In Colab


Python documentation for the pipeline.

__call__(path, task='default', output=None, quantize=False, opset=14)

Exports a Hugging Face Transformer model to ONNX.



path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple


optional model task or category, determines the model type and outputs, defaults to export hidden state


optional output model path, defaults to return byte array if None


if model should be quantized (requires onnx to be installed), defaults to False


onnx opset, defaults to 14




path to model output or model as bytes depending on output parameter

Source code in txtai/pipeline/train/hfonnx.py

  1. 32
  2. 33
  3. 34
  4. 35
  5. 36
  6. 37
  7. 38
  8. 39
  9. 40
  10. 41
  11. 42
  12. 43
  13. 44
  14. 45
  15. 46
  16. 47
  17. 48
  18. 49
  19. 50
  20. 51
  21. 52
  22. 53
  23. 54
  24. 55
  25. 56
  26. 57
  27. 58
  28. 59
  29. 60
  30. 61
  31. 62
  32. 63
  33. 64
  34. 65
  35. 66
  36. 67
  37. 68
  38. 69
  39. 70
  40. 71
  41. 72
  42. 73
  43. 74
  44. 75
  45. 76
  46. 77
  47. 78
  48. 79
  49. 80
  50. 81
  51. 82
  52. 83
  53. 84
  54. 85
  55. 86
  1. def call(self, path, task=”default”, output=None, quantize=False, opset=14):
  2. “””
  3. Exports a Hugging Face Transformer model to ONNX.
  4. Args:
  5. path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
  6. task: optional model task or category, determines the model type and outputs, defaults to export hidden state
  7. output: optional output model path, defaults to return byte array if None
  8. quantize: if model should be quantized (requires onnx to be installed), defaults to False
  9. opset: onnx opset, defaults to 14
  10. Returns:
  11. path to model output or model as bytes depending on output parameter
  12. “””
  13. inputs, outputs, model = self.parameters(task)
  14. if isinstance(path, (list, tuple)):
  15. model, tokenizer = path
  16. model = model.cpu()
  17. else:
  18. model = model(path)
  19. tokenizer = AutoTokenizer.from_pretrained(path)
  20. # Generate dummy inputs
  21. dummy = dict(tokenizer([“test inputs”], return_tensors=”pt”))
  22. # Default to BytesIO if no output file provided
  23. output = output if output else BytesIO()
  24. # Export model to ONNX
  25. export(
  26. model,
  27. (dummy,),
  28. output,
  29. opset_version=opset,
  30. do_constant_folding=True,
  31. input_names=list(inputs.keys()),
  32. output_names=list(outputs.keys()),
  33. dynamic_axes=dict(chain(inputs.items(), outputs.items())),
  34. )
  35. # Quantize model
  36. if quantize:
  37. if not ONNX_RUNTIME:
  38. raise ImportError(‘onnxruntime is not available - install pipeline extra to enable’)
  39. output = self.quantization(output)
  40. if isinstance(output, BytesIO):
  41. # Reset stream and return bytes
  42. output.seek(0)
  43. output = output.read()
  44. return output