HFOnnx
Exports a Hugging Face Transformer model to ONNX. Currently, this works best with classification/pooling/qa models. Work is ongoing for sequence to sequence models (summarization, transcription, translation).
Example
The following shows a simple example using this pipeline.
from txtai.pipeline import HFOnnx, Labels
# Model path
path = "distilbert-base-uncased-finetuned-sst-2-english"
# Export model to ONNX
onnx = HFOnnx()
model = onnx(path, "text-classification", "model.onnx", True)
# Run inference and validate
labels = Labels((model, path), dynamic=False)
labels("I am happy")
See the link below for a more detailed example.
Notebook | Description | |
---|---|---|
Export and run models with ONNX | Export models with ONNX, run natively in JavaScript, Java and Rust |
Methods
Python documentation for the pipeline.
__call__(self, path, task='default', output=None, quantize=False, opset=12)
special
Exports a Hugging Face Transformer model to ONNX.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path | path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple | required | |
task | optional model task or category, determines the model type and outputs, defaults to export hidden state | ‘default’ | |
output | optional output model path, defaults to return byte array if None | None | |
quantize | if model should be quantized (requires onnx to be installed), defaults to False | False | |
opset | onnx opset, defaults to 12 | 12 |
Returns:
Type | Description |
---|---|
path to model output or model as bytes depending on output parameter |
Source code in txtai/pipeline/train/hfonnx.py
def __call__(self, path, task="default", output=None, quantize=False, opset=12):
"""
Exports a Hugging Face Transformer model to ONNX.
Args:
path: path to model, accepts Hugging Face model hub id, local path or (model, tokenizer) tuple
task: optional model task or category, determines the model type and outputs, defaults to export hidden state
output: optional output model path, defaults to return byte array if None
quantize: if model should be quantized (requires onnx to be installed), defaults to False
opset: onnx opset, defaults to 12
Returns:
path to model output or model as bytes depending on output parameter
"""
inputs, outputs, model = self.parameters(task)
if isinstance(path, (list, tuple)):
model, tokenizer = path
model = model.cpu()
else:
model = model(path)
tokenizer = AutoTokenizer.from_pretrained(path)
# Generate dummy inputs
dummy = dict(tokenizer(["test inputs"], return_tensors="pt"))
# Default to BytesIO if no output file provided
output = output if output else BytesIO()
# Export model to ONNX
export(
model,
(dummy,),
output,
opset_version=opset,
do_constant_folding=True,
input_names=list(inputs.keys()),
output_names=list(outputs.keys()),
dynamic_axes=dict(chain(inputs.items(), outputs.items())),
)
# Quantize model
if quantize:
if not ONNX_RUNTIME:
raise ImportError('onnxruntime is not available - install "pipeline" extra to enable')
output = self.quantization(output)
if isinstance(output, BytesIO):
# Reset stream and return bytes
output.seek(0)
output = output.read()
return output