Transcription

pipeline pipeline

The Transcription pipeline converts speech in audio files to text.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Transcription
  2. # Create and run pipeline
  3. transcribe = Transcription()
  4. transcribe("path to wav file")

This pipeline may require additional system dependencies. See this section for more.

See the links below for a more detailed example.

NotebookDescription
Transcribe audio to textConvert audio files to textOpen In Colab
Speech to Speech RAG ▶️Full cycle speech to speech workflow with RAGOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. transcription:
  3. # Run pipeline with workflow
  4. workflow:
  5. transcribe:
  6. tasks:
  7. - action: transcription

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("transcribe", ["path to wav file"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"transcribe", "elements":["path to wav file"]}'

Methods

Python documentation for the pipeline.

__init__(path=None, quantize=False, gpu=True, model=None, **kwargs)

Source code in txtai/pipeline/audio/transcription.py

  1. 25
  2. 26
  3. 27
  4. 28
  5. 29
  6. 30
  7. 31
  8. 32
  1. def init(self, path=None, quantize=False, gpu=True, model=None, kwargs):
  2. if not TRANSCRIPTION:
  3. raise ImportError(
  4. Transcription pipeline is not available - install pipeline extra to enable. Also check that libsndfile is available.’
  5. )
  6. # Call parent constructor
  7. super().init(“automatic-speech-recognition”, path, quantize, gpu, model, kwargs)

__call__(audio, rate=None, chunk=10, join=True, **kwargs)

Transcribes audio files or data to text.

This method supports a single audio element or a list of audio. If the input is audio, the return type is a string. If text is a list, a list of strings is returned

Parameters:

NameTypeDescriptionDefault
audio

audio|list

required
rate

sample rate, only required with raw audio data

None
chunk

process audio in chunk second sized segments

10
join

if True (default), combine each chunk back together into a single text output. When False, chunks are returned as a list of dicts, each having raw associated audio and sample rate in addition to text

True
kwargs

generate keyword arguments

{}

Returns:

TypeDescription

list of transcribed text

Source code in txtai/pipeline/audio/transcription.py

  1. 34
  2. 35
  3. 36
  4. 37
  5. 38
  6. 39
  7. 40
  8. 41
  9. 42
  10. 43
  11. 44
  12. 45
  13. 46
  14. 47
  15. 48
  16. 49
  17. 50
  18. 51
  19. 52
  20. 53
  21. 54
  22. 55
  23. 56
  24. 57
  25. 58
  26. 59
  27. 60
  28. 61
  29. 62
  30. 63
  31. 64
  1. def call(self, audio, rate=None, chunk=10, join=True, kwargs):
  2. “””
  3. Transcribes audio files or data to text.
  4. This method supports a single audio element or a list of audio. If the input is audio, the return
  5. type is a string. If text is a list, a list of strings is returned
  6. Args:
  7. audio: audio|list
  8. rate: sample rate, only required with raw audio data
  9. chunk: process audio in chunk second sized segments
  10. join: if True (default), combine each chunk back together into a single text output.
  11. When False, chunks are returned as a list of dicts, each having raw associated audio and
  12. sample rate in addition to text
  13. kwargs: generate keyword arguments
  14. Returns:
  15. list of transcribed text
  16. “””
  17. # Convert single element to list
  18. values = [audio] if isinstance(audio, (str, tuple, np.ndarray)) else audio
  19. # Read input audio
  20. speech = self.read(values, rate)
  21. # Apply transformation rules and store results
  22. results = self.batchprocess(speech, chunk, kwargs) if chunk and not join else self.process(speech, chunk, **kwargs)
  23. # Return single element if single element passed in
  24. return results[0] if isinstance(audio, (str, tuple, np.ndarray)) else results