Microphone

pipeline pipeline

The Microphone pipeline reads input speech from a microphone device. This pipeline is designed to run on local machines given that it requires access to read from an input device.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import Microphone
  2. # Create and run pipeline
  3. microphone = Microphone()
  4. microphone()

This pipeline may require additional system dependencies. See this section for more.

See the link below for a more detailed example.

NotebookDescription
Speech to Speech RAG ▶️Full cycle speech to speech workflow with RAGOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. microphone:
  3. # Run pipeline with workflow
  4. workflow:
  5. microphone:
  6. tasks:
  7. - action: microphone

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("microphone", ["1"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"microphone", "elements":["1"]}'

Methods

Python documentation for the pipeline.

__init__(rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8)

Creates a new Microphone pipeline.

Parameters:

NameTypeDescriptionDefault
rate

sample rate to record audio in, defaults to 16000 (16 kHz)

16000
vadmode

aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter

3
vadframe

voice activity detector frame size in ms, defaults to 20

20
vadthreshold

percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6

0.6
voicestart

starting frequency to use for voice filtering, defaults to 300

300
voiceend

ending frequency to use for voice filtering, defaults to 3400

3400
active

minimum number of active speech chunks to require before considering this speech, defaults to 5

5
pause

number of non-speech chunks to keep before considering speech complete, defaults to 8

8

Source code in txtai/pipeline/audio/microphone.py

  1. 35
  2. 36
  3. 37
  4. 38
  5. 39
  6. 40
  7. 41
  8. 42
  9. 43
  10. 44
  11. 45
  12. 46
  13. 47
  14. 48
  15. 49
  16. 50
  17. 51
  18. 52
  19. 53
  20. 54
  21. 55
  22. 56
  23. 57
  24. 58
  25. 59
  26. 60
  27. 61
  28. 62
  29. 63
  30. 64
  31. 65
  32. 66
  33. 67
  34. 68
  35. 69
  36. 70
  37. 71
  38. 72
  1. def init(self, rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8):
  2. “””
  3. Creates a new Microphone pipeline.
  4. Args:
  5. rate: sample rate to record audio in, defaults to 16000 (16 kHz)
  6. vadmode: aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter
  7. vadframe: voice activity detector frame size in ms, defaults to 20
  8. vadthreshold: percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6
  9. voicestart: starting frequency to use for voice filtering, defaults to 300
  10. voiceend: ending frequency to use for voice filtering, defaults to 3400
  11. active: minimum number of active speech chunks to require before considering this speech, defaults to 5
  12. pause: number of non-speech chunks to keep before considering speech complete, defaults to 8
  13. “””
  14. if not MICROPHONE:
  15. raise ImportError(
  16. (
  17. Microphone pipeline is not available - install pipeline extra to enable.
  18. Also check that the portaudio system library is available.”
  19. )
  20. )
  21. # Sample rate
  22. self.rate = rate
  23. # Voice activity detector
  24. self.vad = webrtcvad.Vad(vadmode)
  25. self.vadframe = vadframe
  26. self.vadthreshold = vadthreshold
  27. # Voice spectrum
  28. self.voicestart = voicestart
  29. self.voiceend = voiceend
  30. # Audio chunks counts
  31. self.active = active
  32. self.pause = pause

__call__(device=None)

Reads audio from an input device.

Parameters:

NameTypeDescriptionDefault
device

optional input device id, otherwise uses system default

None

Returns:

TypeDescription

list of (audio, sample rate)

Source code in txtai/pipeline/audio/microphone.py

  1. 74
  2. 75
  3. 76
  4. 77
  5. 78
  6. 79
  7. 80
  8. 81
  9. 82
  10. 83
  11. 84
  12. 85
  13. 86
  14. 87
  15. 88
  16. 89
  1. def call(self, device=None):
  2. “””
  3. Reads audio from an input device.
  4. Args:
  5. device: optional input device id, otherwise uses system default
  6. Returns:
  7. list of (audio, sample rate)
  8. “””
  9. # Listen for audio
  10. audio = self.listen(device[0] if isinstance(device, list) else device)
  11. # Return single element if single element passed in
  12. return (audio, self.rate) if device is None or not isinstance(device, list) else [(audio, self.rate)]