Microphone
The Microphone pipeline reads input speech from a microphone device. This pipeline is designed to run on local machines given that it requires access to read from an input device.
Example
The following shows a simple example using this pipeline.
from txtai.pipeline import Microphone
# Create and run pipeline
microphone = Microphone()
microphone()
This pipeline may require additional system dependencies. See this section for more.
See the link below for a more detailed example.
Notebook | Description | |
---|---|---|
Speech to Speech RAG ▶️ | Full cycle speech to speech workflow with RAG |
Configuration-driven example
Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.
config.yml
# Create pipeline using lower case class name
microphone:
# Run pipeline with workflow
workflow:
microphone:
tasks:
- action: microphone
Run with Workflows
from txtai import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("microphone", ["1"]))
Run with API
CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
-X POST "http://localhost:8000/workflow" \
-H "Content-Type: application/json" \
-d '{"name":"microphone", "elements":["1"]}'
Methods
Python documentation for the pipeline.
__init__(rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8)
Creates a new Microphone pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rate | sample rate to record audio in, defaults to 16000 (16 kHz) | 16000 | |
vadmode | aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter | 3 | |
vadframe | voice activity detector frame size in ms, defaults to 20 | 20 | |
vadthreshold | percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6 | 0.6 | |
voicestart | starting frequency to use for voice filtering, defaults to 300 | 300 | |
voiceend | ending frequency to use for voice filtering, defaults to 3400 | 3400 | |
active | minimum number of active speech chunks to require before considering this speech, defaults to 5 | 5 | |
pause | number of non-speech chunks to keep before considering speech complete, defaults to 8 | 8 |
Source code in txtai/pipeline/audio/microphone.py
|
|
__call__(device=None)
Reads audio from an input device.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
device | optional input device id, otherwise uses system default | None |
Returns:
Type | Description |
---|---|
list of (audio, sample rate) |
Source code in txtai/pipeline/audio/microphone.py
|
|