Text To Audio

pipeline pipeline

The Text To Audio pipeline generates audio from text.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import TextToAudio
  2. # Create and run pipeline
  3. tta = TextToAudio()
  4. tta("Describe the audio to generate here")

See the link below for a more detailed example.

NotebookDescription
Generative AudioStorytelling with generative audio workflowsOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. texttoaudio:
  3. # Run pipeline with workflow
  4. workflow:
  5. tta:
  6. tasks:
  7. - action: texttoaudio

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("tta", ["Describe the audio to generate here"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"tta", "elements":["Describe the audio to generate here"]}'

Methods

Python documentation for the pipeline.

__init__(path=None, quantize=False, gpu=True, model=None, rate=None, **kwargs)

Source code in txtai/pipeline/audio/texttoaudio.py

  1. 14
  2. 15
  3. 16
  4. 17
  5. 18
  6. 19
  7. 20
  8. 21
  9. 22
  1. def init(self, path=None, quantize=False, gpu=True, model=None, rate=None, kwargs):
  2. if not SCIPY:
  3. raise ImportError(‘TextToAudio pipeline is not available - install pipeline extra to enable.’)
  4. # Call parent constructor
  5. super().init(“text-to-audio”, path, quantize, gpu, model, kwargs)
  6. # Target sample rate, defaults to model sample rate
  7. self.rate = rate

__call__(text, maxlength=512)

Generates audio from text.

This method supports text as a string or a list. If the input is a string, the return type is a single audio output. If text is a list, the return type is a list.

Parameters:

NameTypeDescriptionDefault
text

text|list

required
maxlength

maximum audio length to generate

512

Returns:

TypeDescription

list of (audio, sample rate)

Source code in txtai/pipeline/audio/texttoaudio.py

  1. 24
  2. 25
  3. 26
  4. 27
  5. 28
  6. 29
  7. 30
  8. 31
  9. 32
  10. 33
  11. 34
  12. 35
  13. 36
  14. 37
  15. 38
  16. 39
  17. 40
  18. 41
  19. 42
  20. 43
  21. 44
  22. 45
  23. 46
  1. def call(self, text, maxlength=512):
  2. “””
  3. Generates audio from text.
  4. This method supports text as a string or a list. If the input is a string,
  5. the return type is a single audio output. If text is a list, the return type is a list.
  6. Args:
  7. text: text|list
  8. maxlength: maximum audio length to generate
  9. Returns:
  10. list of (audio, sample rate)
  11. “””
  12. # Format inputs
  13. texts = [text] if isinstance(text, str) else text
  14. # Run pipeline
  15. results = [self.convert(x) for x in self.pipeline(texts, forward_params={“max_new_tokens”: maxlength})]
  16. # Extract results
  17. return results[0] if isinstance(text, str) else results