跳到内容

RAG

pipeline pipeline

RAG Pipeline (也称为 Extractor) 将提示、上下文数据存储和生成模型结合在一起,用于提取知识。

数据存储可以是嵌入数据库或具有关联输入文本的相似度实例。生成模型可以是提示驱动的大型语言模型(LLM)、抽取式问答模型或自定义 Pipeline。这被称为检索增强生成(RAG)。

示例

下面显示了使用此 Pipeline 的一个简单示例。

from txtai import Embeddings, RAG

# Input data
data = [
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, " +
  "forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends " +
  "in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
]

# Build embeddings index
embeddings = Embeddings(content=True)
embeddings.index(data)

# Create and run pipeline
rag = RAG(embeddings, "google/flan-t5-base", template="""
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
""")

rag("What was won?")

# Instruction tuned models typically require string prompts to
# follow a specific chat template set by the model
rag = RAG(embeddings, "meta-llama/Meta-Llama-3.1-8B-Instruct", template="""
  <|im_start|>system
  You are a friendly assistant.<|im_end|>
  <|im_start|>user
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
  <|im_start|>assistant
  """
)
rag("What was won?")

# LLM options can be passed as additional arguments
rag = RAG(embeddings, "meta-llama/Meta-Llama-3.1-8B-Instruct", template="""
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
""")

# Set the default role to user and string inputs are converted to chat messages
rag("What was won?", defaultrole="user")

有关其他配置选项,请参阅嵌入LLM页面。

请参阅以下链接获取更详细的示例。

Notebook 描述
使用LLM进行提示驱动搜索 使用大型语言模型(LLM)进行嵌入引导和提示驱动搜索 Open In Colab
提示模板和任务链 使用工作流构建模型提示并将任务连接在一起 Open In Colab
使用 txtai 构建 RAG Pipeline 关于检索增强生成的指南,包括如何创建引文 Open In Colab
集成 LLM 框架 集成 llama.cpp, LiteLLM 和自定义生成框架 Open In Colab
使用语义图和 RAG 生成知识 使用语义图和 RAG 进行知识探索和发现 Open In Colab
使用 LLM 构建知识图谱 使用 LLM 驱动的实体提取构建知识图谱 Open In Colab
使用图路径遍历进行高级 RAG 图路径遍历收集复杂数据集以用于高级 RAG Open In Colab
使用引导式生成进行高级 RAG 检索增强和引导生成 Open In Colab
使用 llama.cpp 和外部 API 服务进行 RAG 使用额外的向量和 LLM 框架进行 RAG Open In Colab
txtai 中的 RAG 如何工作 创建 RAG 进程、API 服务和 Docker 实例 Open In Colab
语音到语音 RAG ▶️ 使用 RAG 实现完整的语音到语音工作流 Open In Colab
生成式音频 使用生成式音频工作流进行故事讲述 Open In Colab
使用图和 Agent 分析 Hugging Face 帖子 使用图分析和 Agent 探索丰富的数据集 Open In Colab
赋予 Agent 自主权 Agent 根据情况迭代解决问题 Open In Colab
LLM API 入门 使用 OpenAI, Claude, Gemini, Bedrock 等生成嵌入并运行 LLM Open In Colab
使用图和 Agent 分析 LinkedIn 公司帖子 探索如何利用 AI 改善社交媒体互动 Open In Colab
使用 txtai 进行抽取式问答 txtai 抽取式问答简介 Open In Colab
使用 Elasticsearch 进行抽取式问答 使用 Elasticsearch 运行抽取式问答查询 Open In Colab
利用抽取式问答构建结构化数据 使用抽取式问答构建结构化数据集 Open In Colab
使用 txtai 解析星体 探索已知恒星、行星、星系的宇宙知识图谱 Open In Colab
为 RAG 分块你的数据 提取、分块和索引内容以实现有效检索 Open In Colab

配置驱动示例

Pipeline 可以使用 Python 或配置运行。Pipeline 可以通过 Pipeline 的小写名称在配置中实例化。配置驱动的 Pipeline 使用工作流API运行。

config.yml

# Allow documents to be indexed
writable: True

# Content is required for extractor pipeline
embeddings:
  content: True

rag:
  path: google/flan-t5-base
  template: |
    Answer the following question using the provided context.

    Question:
    {question}

    Context:
    {context}

workflow:
  search:
    tasks:
      - action: rag

通过工作流运行

内置任务使使用 Extractor Pipeline 更加容易。

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
app.add([
  "US tops 5 million confirmed virus cases",
  "Canada's last fully intact ice shelf has suddenly collapsed, " +
  "forming a Manhattan-sized iceberg",
  "Beijing mobilises invasion craft along coast as Taiwan tensions escalate",
  "The National Park Service warns against sacrificing slower friends " +
  "in a bear attack",
  "Maine man wins $1M from $25 lottery ticket",
  "Make huge profits without work, earn up to $100,000 a day"
])
app.index()

list(app.workflow("search", ["What was won?"]))

通过API运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name": "search", "elements": ["What was won"]}'

方法

此 Pipeline 的 Python 文档。

__init__(similarity, path, quantize=False, gpu=True, model=None, tokenizer=None, minscore=None, mintokens=None, context=None, task=None, output='default', template=None, separator=' ', system=None, **kwargs)

构建一个新的 RAG Pipeline。

参数

名称 类型 描述 默认值
similarity

相似度实例(嵌入或相似度 Pipeline)

必需
path

模型路径,支持 LLM、问答或自定义 Pipeline

必需
quantize

如果在推理前应量化模型,则为 True,否则为 False。

False
gpu

是否应使用 GPU 推理(仅在 GPU 可用时有效)

True
model

可选的现有 Pipeline 模型进行封装

None
tokenizer

Tokenizer 类

None
minscore

包含上下文匹配的最低分数,默认为 None

None
mintokens

包含上下文匹配的最低 token 数,默认为 None

None
context

要包含的 topn 个上下文匹配项,默认为 3

None
task

模型任务(语言生成、序列到序列或问答),默认为自动检测

None
output

输出格式,'default' 返回 (name, answer),'flatten' 返回 answers,'reference' 返回 (name, answer, reference)

'default'
template

提示模板,必须包含 {question} 和 {context} 参数,默认为 "{question} {context}"

None
separator

上下文分隔符

' '
system

系统提示,默认为 None

None
kwargs

传递给 Pipeline 模型的额外关键字参数

{}
源代码位于 txtai/pipeline/llm/rag.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def __init__(
    self,
    similarity,
    path,
    quantize=False,
    gpu=True,
    model=None,
    tokenizer=None,
    minscore=None,
    mintokens=None,
    context=None,
    task=None,
    output="default",
    template=None,
    separator=" ",
    system=None,
    **kwargs,
):
    """
    Builds a new RAG pipeline.

    Args:
        similarity: similarity instance (embeddings or similarity pipeline)
        path: path to model, supports a LLM, Questions or custom pipeline
        quantize: True if model should be quantized before inference, False otherwise.
        gpu: if gpu inference should be used (only works if GPUs are available)
        model: optional existing pipeline model to wrap
        tokenizer: Tokenizer class
        minscore: minimum score to include context match, defaults to None
        mintokens: minimum number of tokens to include context match, defaults to None
        context: topn context matches to include, defaults to 3
        task: model task (language-generation, sequence-sequence or question-answering), defaults to auto-detect
        output: output format, 'default' returns (name, answer), 'flatten' returns answers and 'reference' returns (name, answer, reference)
        template: prompt template, it must have a parameter for {question} and {context}, defaults to "{question} {context}"
        separator: context separator
        system: system prompt, defaults to None
        kwargs: additional keyword arguments to pass to pipeline model
    """

    # Similarity instance
    self.similarity = similarity

    # Model can be a LLM, Questions or custom pipeline
    self.model = self.load(path, quantize, gpu, model, task, **kwargs)

    # Tokenizer class use default method if not set
    self.tokenizer = tokenizer if tokenizer else Tokenizer() if hasattr(self.similarity, "scoring") and self.similarity.isweighted() else None

    # Minimum score to include context match
    self.minscore = minscore if minscore is not None else 0.0

    # Minimum number of tokens to include context match
    self.mintokens = mintokens if mintokens is not None else 0.0

    # Top n context matches to include for context
    self.context = context if context else 3

    # Output format
    self.output = output

    # Prompt template
    self.template = template if template else "{question} {context}"

    # Context separator
    self.separator = separator

    # System prompt template
    self.system = system

__call__(queue, texts=None, **kwargs)

查找输入问题的答案。此方法运行查询以找到前 n 个最佳匹配,并将其用作上下文。然后针对每个输入问题,使用模型对上下文进行推理,并返回答案。

参数

名称 类型 描述 默认值
queue

输入问题队列 (name, query, question, snippet),可以是元组/字典/字符串列表或单个输入元素

必需
texts

可选的上下文文本列表,否则运行嵌入搜索

None
kwargs

传递给 Pipeline 模型的额外关键字参数

{}

返回值

类型 描述

与输入格式(元组或字典)匹配的答案列表,包含由输出格式指定的字段

源代码位于 txtai/pipeline/llm/rag.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
def __call__(self, queue, texts=None, **kwargs):
    """
    Finds answers to input questions. This method runs queries to find the top n best matches and uses that as the context.
    A model is then run against the context for each input question, with the answer returned.

    Args:
        queue: input question queue (name, query, question, snippet), can be list of tuples/dicts/strings or a single input element
        texts: optional list of text for context, otherwise runs embeddings search
        kwargs: additional keyword arguments to pass to pipeline model

    Returns:
        list of answers matching input format (tuple or dict) containing fields as specified by output format
    """

    # Save original queue format
    inputs = queue

    # Convert queue to list, if necessary
    queue = queue if isinstance(queue, list) else [queue]

    # Convert dictionary inputs to tuples
    if queue and isinstance(queue[0], dict):
        # Convert dict to tuple
        queue = [tuple(row.get(x) for x in ["name", "query", "question", "snippet"]) for row in queue]

    if queue and isinstance(queue[0], str):
        # Convert string questions to tuple
        queue = [(None, row, row, None) for row in queue]

    # Rank texts by similarity for each query
    results = self.query([query for _, query, _, _ in queue], texts)

    # Build question-context pairs
    names, queries, questions, contexts, topns, snippets = [], [], [], [], [], []
    for x, (name, query, question, snippet) in enumerate(queue):
        # Get top n best matching segments
        topn = sorted(results[x], key=lambda y: y[2], reverse=True)[: self.context]

        # Generate context using ordering from texts, if available, otherwise order by score
        context = self.separator.join(text for _, text, _ in (sorted(topn, key=lambda y: y[0]) if texts else topn))

        names.append(name)
        queries.append(query)
        questions.append(question)
        contexts.append(context)
        topns.append(topn)
        snippets.append(snippet)

    # Run pipeline and return answers
    answers = self.answers(questions, contexts, **kwargs)

    # Apply output formatting to answers and return
    return self.apply(inputs, names, queries, answers, topns, snippets) if isinstance(answers, list) else answers