麦克风

pipeline

麦克风流水线从麦克风设备读取输入语音。此流水线设计用于在本地机器上运行，因为它需要访问输入设备进行读取。

示例

下面展示了使用此流水线的一个简单示例。

from txtai.pipeline import Microphone

# Create and run pipeline
microphone = Microphone()
microphone()

此流水线可能需要额外的系统依赖项。请参阅此部分了解更多信息。

请参阅下面的链接以获取更详细的示例。

笔记本	描述
语音转语音 RAG ▶️	使用 RAG 的完整语音转语音工作流

配置驱动的示例

流水线可以使用 Python 或配置运行。流水线可以在配置中使用流水线的名称（小写）实例化。配置驱动的流水线可以使用工作流或API运行。

config.yml

# Create pipeline using lower case class name
microphone:

# Run pipeline with workflow
workflow:
  microphone:
    tasks:
      - action: microphone

使用工作流运行

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("microphone", ["1"]))

使用 API 运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "https://:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"microphone", "elements":["1"]}'

方法

该流水线的 Python 文档。

`init(rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8)`

创建一个新的麦克风流水线。

参数

名称	描述	默认值
`rate`	音频录制的采样率，默认为 16000 (16 kHz)	`16000`
`vadmode`	语音活动检测器 (VAD) 的侵略性 (1 - 3)，默认为 3，这是最具侵略性的过滤器	`3`
`vadframe`	语音活动检测器 (VAD) 帧大小（毫秒），默认为 20	`20`
`vadthreshold`	被认为是语音的帧百分比 (0.0 - 1.0)，默认为 0.6	`0.6`
`voicestart`	用于语音滤波的起始频率，默认为 300	`300`
`voiceend`	用于语音滤波的结束频率，默认为 3400	`3400`
`active`	在认为这是语音之前所需的最小活动语音块数，默认为 5	`5`
`pause`	在认为语音完成之前保留的非语音块数，默认为 8	`8`

源代码位于 txtai/pipeline/audio/microphone.py

def __init__(self, rate=16000, vadmode=3, vadframe=20, vadthreshold=0.6, voicestart=300, voiceend=3400, active=5, pause=8):
    """
    Creates a new Microphone pipeline.

    Args:
        rate: sample rate to record audio in, defaults to 16000 (16 kHz)
        vadmode: aggressiveness of the voice activity detector (1 - 3), defaults to 3, which is the most aggressive filter
        vadframe: voice activity detector frame size in ms, defaults to 20
        vadthreshold: percentage of frames (0.0 - 1.0) that must be voice to be considered speech, defaults to 0.6
        voicestart: starting frequency to use for voice filtering, defaults to 300
        voiceend: ending frequency to use for voice filtering, defaults to 3400
        active: minimum number of active speech chunks to require before considering this speech, defaults to 5
        pause: number of non-speech chunks to keep before considering speech complete, defaults to 8
    """

    if not MICROPHONE:
        raise ImportError(
            (
                'Microphone pipeline is not available - install "pipeline" extra to enable. '
                "Also check that the portaudio system library is available."
            )
        )

    # Sample rate
    self.rate = rate

    # Voice activity detector
    self.vad = webrtcvad.Vad(vadmode)
    self.vadframe = vadframe
    self.vadthreshold = vadthreshold

    # Voice spectrum
    self.voicestart = voicestart
    self.voiceend = voiceend

    # Audio chunks counts
    self.active = active
    self.pause = pause

`call(device=None)`

从输入设备读取音频。

参数

名称	类型	描述	默认值
`device`		可选的输入设备 ID，否则使用系统默认值	`None`

返回值

类型	描述
	list of (audio, 采样率)