Plugin Specification

Plugin Specification

Pipcook uses plugins to achieve tasks in a specific machine learning lifecycle, which ensures that the framework is simple, stable, and efficient enough.

At the same time, through a set of plugin specifications defined by Pipcook, we can also allow anyone to develop plugins, which ensures the scalability of Pipcook. Theoretically, through plugins, we can achieve all kinds of the machine learning task.

Plugin Package

Pipcook uses the form of NPM as a plugin package. Besides, we have expanded the protocol that belongs to the Pipcook Plugin based on NPM package.json.

{
  "name": "my-own-pipcook-plugin",
  "version": "1.0.0",
  "description": "my own pipcook plugin",
  "dependencies": {
    "@pipcook/pipcook-core": "^0.5.0"
  },
  "pipcook": {
    "category": "dataCollect",
    "datatype": "image"
  },
  "conda": {
    "python": "3.7",
    "dependencies": {
      "tensorflow": "2.2.0"
    }
  }
}

After reading the package.json example above, there are a few requirements:

plugin package must be written in TypeScript, and compile it to JavaScript before publishing.
adding the @pipcook/pipcook-core to dependencies is required, which contains the unusual types for creating a plugin handler.
adding a root field pipcook,
- pipcook.category is used to describe the category to which the plugin belongs, and all categories are listed here.
- pipcook.datatype is used to describe the type of data to be processed, currently supports: common, image, and text.
adding an optional field conda for configuring Python-related dependencies,
- conda.python is used to specify the Python version, must be 3.7.
- conda.dependencies is used to list all Python dependencies which will be installed on plugin initialization, and it supports the following kinds of version string:
  - x.y.z, the specific version on PyPI.
  - *, the same to above with the latest version.
  - git+https://github.com/foobar/project@master, install from GitHub repository, it follows pip-install(1).

Plugin Category

We have defined the following plugin categories for the machine learning lifecycle.

dataCollect(args: ArgsType): Promise<void> downloads from data source, which is stored in corresponding unified dataset.
dataAccess(args: ArgsType): Promise<UniDataset> gets the dataset ready in loader and compatible with later model.
dataProcess(sample: Sample, md: Metadata, args: ArgsType): Promise<void> processes data in row.
modelLoad(data: UniDataset, args: ArgsType): Promise<UniModel> loads the model into the pipeline.
modelDefine(data: UniDataset, args: ModelDefineArgsType): Promise<UniModel> defines the model.
modelTrain(data: UniDataset, model: UniModel, args: ModelTrainArgsType): Promise<UniModel> outputs the trained model and saves to configured location.
modelEvaluate(data: UniDataset, model: UniModel): Promise<EvaluateResult> calls to corresponding evaluators to view how does the trained model perform.

Developing

Check this contributing documentation for learning how to develop a new plugin.