Storage Task
The Storage Task expands a local directory or cloud storage bucket into a list of URLs to process.
Example
The following shows a simple example using this task as part of a workflow.
from txtai.workflow import StorageTask, Workflow
workflow = Workflow([StorageTask()])
workflow(["s3://path/to/bucket", "local://local/directory"])
Configuration-driven example
This task can also be created with workflow configuration.
workflow:
tasks:
- task: storage
Methods
Python documentation for the task.
__init__(action=None, select=None, unpack=True, column=None, merge='hstack', initialize=None, finalize=None, concurrency=None, onetomany=True, **kwargs)
Creates a new task. A task defines two methods, type of data it accepts and the action to execute for each data element. Action is a callable function or list of callable functions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
action | action(s) to execute on each data element | None | |
select | filter(s) used to select data to process | None | |
unpack | if data elements should be unpacked or unwrapped from (id, data, tag) tuples | True | |
column | column index to select if element is a tuple, defaults to all | None | |
merge | merge mode for joining multi-action outputs, defaults to hstack | ‘hstack’ | |
initialize | action to execute before processing | None | |
finalize | action to execute after processing | None | |
concurrency | sets concurrency method when execute instance available valid values: “thread” for thread-based concurrency, “process” for process-based concurrency | None | |
onetomany | if one-to-many data transformations should be enabled, defaults to True | True | |
kwargs | additional keyword arguments | {} |
Source code in txtai/workflow/task/base.py
|
|