文本转语音

pipeline

文本转语音流水线从文本生成语音。

示例

下面展示了使用此流水线的一个简单示例。

from txtai.pipeline import TextToSpeech

# Create and run pipeline with default model
tts = TextToSpeech()
tts("Say something here")

# Stream audio - incrementally generates snippets of audio
yield from tts(
  "Say something here. And say something else.".split(),
  stream=True
)

# Generate audio using a speaker id
tts = TextToSpeech("neuml/vctk-vits-onnx")
tts("Say something here", speaker=15)

# Generate audio using speaker embeddings
tts = TextToSpeech("neuml/txtai-speecht5-onnx")
tts("Say something here", speaker=np.array(...))

有关更详细的示例，请参阅下面的链接。

Notebook	描述
文本转语音生成	从文本生成语音
语音转语音 RAG ▶️	带有 RAG 的全周期语音转语音工作流
生成式音频	使用生成式音频工作流讲故事

此流水线由 Hugging Face Hub 中的 ONNX 模型支持。目前提供以下模型。

配置驱动示例

流水线可以使用 Python 或配置运行。流水线可以在配置中实例化，使用流水线的名称小写形式。配置驱动的流水线可以使用工作流或API运行。

config.yml

# Create pipeline using lower case class name
texttospeech:

# Run pipeline with workflow
workflow:
  tts:
    tasks:
      - action: texttospeech

使用工作流运行

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("tts", ["Say something here"]))

使用 API 运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "https://:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"tts", "elements":["Say something here"]}'

方法

此流水线的 Python 文档。

`init(path=None, maxtokens=512, rate=22050)`

创建一个新的 TextToSpeech 流水线。

参数

名称	描述	默认值
`path`	可选的模型路径	`None`
`maxtokens`	模型可以处理的最大 token 数量，默认为 512	`512`
`rate`	目标采样率，默认为 22050	`22050`

源代码位于 txtai/pipeline/audio/texttospeech.py

def __init__(self, path=None, maxtokens=512, rate=22050):
    """
    Creates a new TextToSpeech pipeline.

    Args:
        path: optional model path
        maxtokens: maximum number of tokens model can process, defaults to 512
        rate: target sample rate, defaults to 22050
    """

    if not TTS:
        raise ImportError('TextToSpeech pipeline is not available - install "pipeline" extra to enable')

    # Default path
    path = path if path else "neuml/ljspeech-jets-onnx"

    # Target sample rate
    self.rate = rate

    # Load target tts pipeline
    self.pipeline = None
    if self.hasfile(path, "model.onnx") and self.hasfile(path, "config.yaml"):
        self.pipeline = ESPnet(path, maxtokens, self.providers())
    elif self.hasfile(path, "model.onnx") and self.hasfile(path, "voices.json"):
        self.pipeline = Kokoro(path, maxtokens, self.providers())
    else:
        self.pipeline = SpeechT5(path, maxtokens, self.providers())

`call(text, stream=False, speaker=1, encoding=None, **kwargs)`

从文本生成语音。文本长度超过 maxtokens 将被分批处理，并作为每个文本输入的单个波形返回。

此方法支持文本作为字符串或列表。如果输入是字符串，返回类型是音频。如果文本是列表，返回类型是列表。

参数

名称	描述	默认值
`text`	text\|list	必需
`stream`	如果为 True 则流式传输响应，默认为 False	`False`
`speaker`	说话人 ID，默认为 1	`1`
`encoding`	可选的音频编码格式	`None`
`kwargs`	附加的关键字参数	`{}`

返回值

类型	描述
	取决于 encoding 参数，返回 (音频, 采样率) 列表或音频列表