Introduction to Pipeline
In Pipcook, we use Pipeline to represent the training process of a model, so in general, what kind of pipeline is needed to train a model? The developer can use a JSON to describe pipeline of modeling from sample collection, model definition, training to model evaluation:
{
"plugins": {
"dataCollect": {
"package": "@pipcook/plugins-csv-data-collect",
"params": {
"url": "http://foobar"
}
},
"dataAccess": {
"package": "@pipcook/plugins-csv-data-access",
"params": {
"labelColumn": "output"
}
},
"modelDefine": {
"package": "@pipcook/plugins-bayesian-model-define"
},
"modelTrain": {
"package": "@pipcook/plugins-bayesian-model-train"
},
"modelEvaluate": {
"package": "@pipcook/plugins-bayesian-model-evaluate"
}
}
}
As shown above, a pipeline is composed of different plugins, and we add the field params
to each plugin to pass given parameters. Then the pipeline interpreter will perform the corresponding operation(s) by its plugin type and parameters.
See Introduction to Plugin for more details about plugin.
Next, when we have defined such a pipeline, we can run it through Pipcook.
Preparation
Follow the Pipcook Tools Initlization to get the Pipcook ready.
Run Pipeline
Save the above JSON of your pipeline in anywhere, and run:
$ pipcook run /path/to/your/pipeline-config.json
The trained model will generate an output
directory under cwd(3)
:
📂output
┣ 📂logs
┣ 📂model
┣ 📜package.json
┣ 📜metadata.json
┗ 📜index.js
To get started with your trained model, follow the below steps:
$ npm install
It will install dependencies which contain the plugins and Python packages. Pipcook provides a way to use tuna mirror when it downloads Python and packages:
$ BOA_TUNA=1 npm install
Once the output is initialized, just import
it as the following:
import * as predict from './output';
predict('your input data');