Transcription

Transcription

The Transcription pipeline converts speech in audio files to text.

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import Transcription
# Create and run pipeline
transcribe = Transcription()
transcribe("path to wav file")

This pipeline may require additional system dependencies. See this section for more.

See the links below for a more detailed example.

Notebook	Description
Transcribe audio to text	Convert audio files to text
Speech to Speech RAG ▶️	Full cycle speech to speech workflow with RAG

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

# Create pipeline using lower case class name
transcription:
# Run pipeline with workflow
workflow:
  transcribe:
    tasks:
      - action: transcription

Run with Workflows

from txtai import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("transcribe", ["path to wav file"]))

Run with API

CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"transcribe", "elements":["path to wav file"]}'

Methods

Python documentation for the pipeline.

`init(path=None, quantize=False, gpu=True, model=None, **kwargs)`

Source code in txtai/pipeline/audio/transcription.py

def init(self, path=None, quantize=False, gpu=True, model=None, kwargs):
    if not TRANSCRIPTION:
        raise ImportError(
            ‘Transcription pipeline is not available - install “pipeline” extra to enable. Also check that libsndfile is available.’
        )
    # Call parent constructor
    super().init(“automatic-speech-recognition”, path, quantize, gpu, model, kwargs)

`call(audio, rate=None, chunk=10, join=True, **kwargs)`

Transcribes audio files or data to text.

This method supports a single audio element or a list of audio. If the input is audio, the return type is a string. If text is a list, a list of strings is returned

Parameters:

Name	Description	Default
`audio`	audio\|list	required
`rate`	sample rate, only required with raw audio data	`None`
`chunk`	process audio in chunk second sized segments	`10`
`join`	if True (default), combine each chunk back together into a single text output. When False, chunks are returned as a list of dicts, each having raw associated audio and sample rate in addition to text	`True`
`kwargs`	generate keyword arguments	`{}`

Returns:

Type	Description
	list of transcribed text

Source code in txtai/pipeline/audio/transcription.py

def call(self, audio, rate=None, chunk=10, join=True, kwargs):
    “””
    Transcribes audio files or data to text.
    This method supports a single audio element or a list of audio. If the input is audio, the return
    type is a string. If text is a list, a list of strings is returned
    Args:
        audio: audio|list
        rate: sample rate, only required with raw audio data
        chunk: process audio in chunk second sized segments
        join: if True (default), combine each chunk back together into a single text output.
              When False, chunks are returned as a list of dicts, each having raw associated audio and
              sample rate in addition to text
        kwargs: generate keyword arguments
    Returns:
        list of transcribed text
    “””
    # Convert single element to list
    values = [audio] if isinstance(audio, (str, tuple, np.ndarray)) else audio
    # Read input audio
    speech = self.read(values, rate)
    # Apply transformation rules and store results
    results = self.batchprocess(speech, chunk, kwargs) if chunk and not join else self.process(speech, chunk, **kwargs)
    # Return single element if single element passed in
    return results[0] if isinstance(audio, (str, tuple, np.ndarray)) else results

Transcription

Transcription

Example

Configuration-driven example

config.yml

Run with Workflows

Run with API

Methods

__init__(path=None, quantize=False, gpu=True, model=None, **kwargs)

__call__(audio, rate=None, chunk=10, join=True, **kwargs)

`init(path=None, quantize=False, gpu=True, model=None, **kwargs)`

`call(audio, rate=None, chunk=10, join=True, **kwargs)`