跳到内容

转录

pipeline pipeline

转录流水线将音频文件中的语音转换为文本。

示例

以下是使用此流水线的简单示例。

from txtai.pipeline import Transcription

# Create and run pipeline
transcribe = Transcription()
transcribe("path to wav file")

此流水线可能需要额外的系统依赖项。更多信息请参见本节

更多详细示例请参见下方链接。

笔记本 描述
将音频转录为文本 将音频文件转换为文本 Open In Colab
语音到语音 RAG ▶️ 使用 RAG 实现完整的语音到语音工作流 Open In Colab

配置驱动的示例

流水线可以使用 Python 或通过配置运行。流水线可以在配置中使用流水线的小写名称进行实例化。配置驱动的流水线可以通过工作流API运行。

config.yml

# Create pipeline using lower case class name
transcription:

# Run pipeline with workflow
workflow:
  transcribe:
    tasks:
      - action: transcription

通过工作流运行

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("transcribe", ["path to wav file"]))

通过 API 运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"transcribe", "elements":["path to wav file"]}'

方法

此流水线的 Python 文档。

__init__(path=None, quantize=False, gpu=True, model=None, **kwargs)

源代码位于 txtai/pipeline/audio/transcription.py
25
26
27
28
29
30
31
32
def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs):
    if not TRANSCRIPTION:
        raise ImportError(
            'Transcription pipeline is not available - install "pipeline" extra to enable. Also check that libsndfile is available.'
        )

    # Call parent constructor
    super().__init__("automatic-speech-recognition", path, quantize, gpu, model, **kwargs)

__call__(audio, rate=None, chunk=10, join=True, **kwargs)

将音频文件或数据转录为文本。

此方法支持单个音频元素或音频列表。如果输入是单个音频,返回类型为字符串。如果输入是音频列表,则返回字符串列表。

参数

名称 类型 描述 默认值
audio

audio|list

必需
rate

采样率,仅当使用原始音频数据时必需

None
chunk

按分块(秒)大小处理音频段

10
join

如果为 True(默认值),将每个分块合并回一个单一文本输出。如果为 False,分块将作为字典列表返回,每个字典除文本外还包含相关的原始音频和采样率

True
kwargs

生成关键字参数

{}

返回值

类型 描述

转录文本列表

源代码位于 txtai/pipeline/audio/transcription.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def __call__(self, audio, rate=None, chunk=10, join=True, **kwargs):
    """
    Transcribes audio files or data to text.

    This method supports a single audio element or a list of audio. If the input is audio, the return
    type is a string. If text is a list, a list of strings is returned

    Args:
        audio: audio|list
        rate: sample rate, only required with raw audio data
        chunk: process audio in chunk second sized segments
        join: if True (default), combine each chunk back together into a single text output.
              When False, chunks are returned as a list of dicts, each having raw associated audio and
              sample rate in addition to text
        kwargs: generate keyword arguments

    Returns:
        list of transcribed text
    """

    # Convert single element to list
    values = [audio] if self.isaudio(audio) else audio

    # Read input audio
    speech = self.read(values, rate)

    # Apply transformation rules and store results
    results = self.batchprocess(speech, chunk, **kwargs) if chunk and not join else self.process(speech, chunk, **kwargs)

    # Return single element if single element passed in
    return results[0] if self.isaudio(audio) else results