表格

pipeline

表格管道将表格数据分割成行和列。表格管道最适用于创建 (id, text, tag) 元组，以便加载到嵌入索引中。

示例

下面展示了使用此管道的简单示例。

from txtai.pipeline import Tabular

# Create and run pipeline
tabular = Tabular("id", ["text"])
tabular("path to csv file")

请参阅下方链接，获取更详细的示例。

Notebook	描述
使用可组合工作流转换表格数据	转换、索引和搜索表格数据

配置驱动示例

管道可以通过 Python 或配置运行。可以使用管道的小写名称在配置中实例化管道。配置驱动的管道可以通过工作流或API运行。

config.yml

# Create pipeline using lower case class name
tabular:
    idcolumn: id
    textcolumns:
      - text

# Run pipeline with workflow
workflow:
  tabular:
    tasks:
      - action: tabular

使用工作流运行

from txtai import Application

# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("tabular", ["path to csv file"]))

使用 API 运行

CONFIG=config.yml uvicorn "txtai.api:app" &

curl \
  -X POST "http://localhost:8000/workflow" \
  -H "Content-Type: application/json" \
  -d '{"name":"tabular", "elements":["path to csv file"]}'

方法

该管道的 Python 文档。

`init(idcolumn=None, textcolumns=None, content=False)`

创建一个新的表格管道。

参数

名称	描述	默认值
`idcolumn`	用作行 ID 的列名	`None`
`textcolumns`	要合并为文本字段的列列表	`None`
`content`	如果为 True，则为每行生成一个包含所有字段的字典。如果 content 是一个列表，则生成的行中只包含字段的子集。	`False`

源代码位于 txtai/pipeline/data/tabular.py 中

def __init__(self, idcolumn=None, textcolumns=None, content=False):
    """
    Creates a new Tabular pipeline.

    Args:
        idcolumn: column name to use for row id
        textcolumns: list of columns to combine as a text field
        content: if True, a dict per row is generated with all fields. If content is a list, a subset of fields
                 is included in the generated rows.
    """

    if not PANDAS:
        raise ImportError('Tabular pipeline is not available - install "pipeline" extra to enable')

    self.idcolumn = idcolumn
    self.textcolumns = textcolumns
    self.content = content

`call(data)`

将数据分割成行和列。

参数

名称	类型	描述	默认值
`data`		输入数据	必需

返回值

类型	描述
	列表 (id, text, tag)