Transcription

Transcription

The Transcription pipeline converts speech in audio files to text.

Example

The following shows a simple example using this pipeline.

from txtai.pipeline import Transcription
# Create and run pipeline
transcribe = Transcription()
transcribe("path to wav file")

See the link below for a more detailed example.

Notebook	Description
Transcribe audio to text	Convert audio files to text

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

# Create pipeline using lower case class name
transcription:
# Run pipeline with workflow
workflow:
  transcribe:
    tasks:
      - action: transcription

Run with Workflows

from txtai.app import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("transcribe", ["path to wav file"]))

Run with API

CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"transcribe", "elements":["path to wav file"]}'

Methods

Python documentation for the pipeline.

Source code in txtai/pipeline/audio/transcription.py

def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs):
    if not SOUNDFILE:
        raise ImportError("SoundFile library not installed or libsndfile not found")
    # Call parent constructor
    super().__init__("automatic-speech-recognition", path, quantize, gpu, model, **kwargs)

Transcribes audio files or data to text.

This method supports a single audio element or a list of audio. If the input is audio, the return type is a string. If text is a list, a list of strings is returned

Parameters:

Name	Description	Default
`audio`	audio\|list	required
`rate`	sample rate, only required with raw audio data	`None`
`chunk`	process audio in chunk second sized segments	`10`
`join`	if True (default), combine each chunk back together into a single text output. When False, chunks are returned as a list of dicts, each having raw associated audio and sample rate in addition to text	`True`

Returns:

Type	Description
	list of transcribed text

Source code in txtai/pipeline/audio/transcription.py

def __call__(self, audio, rate=None, chunk=10, join=True):
    """
    Transcribes audio files or data to text.
    This method supports a single audio element or a list of audio. If the input is audio, the return
    type is a string. If text is a list, a list of strings is returned
    Args:
        audio: audio|list
        rate: sample rate, only required with raw audio data
        chunk: process audio in chunk second sized segments
        join: if True (default), combine each chunk back together into a single text output.
              When False, chunks are returned as a list of dicts, each having raw associated audio and
              sample rate in addition to text
    Returns:
        list of transcribed text
    """
    # Convert single element to list
    values = [audio] if not isinstance(audio, list) else audio
    # Read input audio
    speech = self.read(values, rate)
    # Apply transformation rules and store results
    results = self.batchprocess(speech, chunk) if chunk and not join else self.process(speech, chunk)
    # Return single element if single element passed in
    return results[0] if not isinstance(audio, list) else results