版本:v1.3
机器学习插件
机器学习插件分为模型训练和模型服务两个插件,使用如下命令安装插件:
vela addon enable model-training
vela addon enable model-serving
模型训练插件中含有 model-training
和 jupyter-notebook
两个组件定义,你可以使用前者来训练模型,后者用于启动一个 jupyter。
$ vela show model-training
# Properties
+------------------+----------------------------------------------------------------------------------+-------------------------------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+------------------+----------------------------------------------------------------------------------+-------------------------------+----------+---------+
| env | Define arguments by using environment variables | [[]env](#env) | false | |
| labels | Specify the labels in the workload | map[string]string | false | |
| annotations | Specify the annotations in the workload | map[string]string | false | |
| framework | The training framework to use | string | true | |
| image | Which image would you like to use for your service | string | true | |
| imagePullPolicy | Specify image pull policy for your service | string | false | |
| cpu | Number of CPU units for the service, like `0.5` (0.5 CPU core), `1` (1 CPU core) | string | false | |
| memory | Specifies the attributes of the memory resource required for the container. | string | false | |
| gpu | Specifies the attributes of the gpu resource required for the container. | string | false | |
| storage | | [[]storage](#storage) | false | |
| imagePullSecrets | Specify image pull secrets for your service | []string | false | |
| distribution | If you want to train the model in distributed mode, specify here | [distribution](#distribution) | false | |
| restartPolicy | | string | true | Never |
+------------------+----------------------------------------------------------------------------------+-------------------------------+----------+---------+
## distribution
+-----------+---------------------------------------------------------+------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+-----------+---------------------------------------------------------+------+----------+---------+
| ps | The number of PS replicas, suits for tensorflow model | int | false | |
| master | The number of Master replicas, suits for pytorch model | int | false | |
| scheduler | The number of Scheduler replicas, suits for mxnet model | int | false | |
| server | The number of Server replicas, suits for mxnet model | int | false | |
| launcher | The number of Launcher replicas, suits for mpi model | int | false | |
| worker | The number of Worker replicas | int | false | |
+-----------+---------------------------------------------------------+------+----------+---------+
## storage
+------------------+--------------------------------------------------------------------------+---------------------------------+----------+------------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+------------------+--------------------------------------------------------------------------+---------------------------------+----------+------------+
| name | | string | false | |
| resources | | [resources](#resources) | false | |
| pvcRef | If you want to use a existed PVC, specify the PVC name and moutPath here | [pvcRef](#pvcRef) | false | |
| mountPath | | string | false | |
| accessModes | | [...] | true | |
| volumeMode | | string | true | Filesystem |
| storageClassName | | string | false | |
| dataSourceRef | | [dataSourceRef](#dataSourceRef) | false | |
| dataSource | | [dataSource](#dataSource) | false | |
| selector | | [selector](#selector) | false | |
+------------------+--------------------------------------------------------------------------+---------------------------------+----------+------------+
$ vela show jupyter-notebook
# Properties
+-------------+------------------------------------------------------------------------------------------------------+-----------------------+----------+-----------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+-------------+------------------------------------------------------------------------------------------------------+-----------------------+----------+-----------+
| cpu | Number of CPU units for the service, like `0.5` (0.5 CPU core), `1` (1 CPU core) | string | false | |
| memory | Specifies the attributes of the memory resource required for the container. | string | false | |
| gpu | Specifies the attributes of the gpu resource required for the container. | string | false | |
| storage | | [[]storage](#storage) | false | |
| serviceType | Specify what kind of Service you want. options: "ClusterIP","NodePort","LoadBalancer","ExternalName" | string | true | ClusterIP |
+-------------+------------------------------------------------------------------------------------------------------+-----------------------+----------+-----------+
## storage
+-----------+-------------+--------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+-----------+-------------+--------+----------+---------+
| name | | string | true | |
| mountPath | | string | true | |
+-----------+-------------+--------+----------+---------+
模型服务插件中含有一个 model-serving 组件定义,用于启动模型服务。
$ vela show model-serving
# Properties
+---------------+-----------------------------------------------------------------------+---------------------------------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+---------------+-----------------------------------------------------------------------+---------------------------------+----------+---------+
| timeout | If you model serving need long time to return, please set the timeout | string | false | |
| customRouting | Specify the custom routing of the serving | [customRouting](#customRouting) | false | |
| protocol | Protocol of model serving, default to seldon | string | false | |
| predictors | The predictors of the serving | [[]predictors](#predictors) | true | |
+---------------+-----------------------------------------------------------------------+---------------------------------+----------+---------+
## predictors
+------------+------------------------------------------------------------------------------------+---------------------------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+------------+------------------------------------------------------------------------------------+---------------------------+----------+---------+
| name | Name of the predictor | string | true | |
| replicas | Replica of the predictor | int | false | |
| traffic | If you want to split the traffic to different serving, please set the traffic here | int | false | |
| graph | The graph of the predictor | [graph](#graph) | true | |
| resources | The resources of the serving | [resources](#resources) | false | |
| autoscaler | The autoscaler of the serving | [autoscaler](#autoscaler) | false | |
+------------+------------------------------------------------------------------------------------+---------------------------+----------+---------+
### autoscaler
+-------------+--------------------------------------+-----------------------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+-------------+--------------------------------------+-----------------------+----------+---------+
| minReplicas | The min replicas of this auto scaler | int | true | |
| maxReplicas | The max replicas of this auto scaler | int | true | |
| metrics | The metrics of this auto scaler | [[]metrics](#metrics) | true | |
+-------------+--------------------------------------+-----------------------+----------+---------+
#### metrics
+--------------------------+----------------------------------------------------+--------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+--------------------------+----------------------------------------------------+--------+----------+---------+
| type | The type of this auto scaler | string | true | |
| targetAverageUtilization | The target average utilization of this auto scaler | int | true | |
+--------------------------+----------------------------------------------------+--------+----------+---------+
### resources
+--------+----------------------------------------------------------------------------------+--------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+--------+----------------------------------------------------------------------------------+--------+----------+---------+
| cpu | Number of CPU units for the service, like `0.5` (0.5 CPU core), `1` (1 CPU core) | string | false | |
| memory | Specifies the attributes of the memory resource required for the container. | string | false | |
| gpu | Specifies the attributes of the gpu resource required for the container. | string | false | |
+--------+----------------------------------------------------------------------------------+--------+----------+---------+
### graph
+----------------+-------------------------------------------------------------------------------+--------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+----------------+-------------------------------------------------------------------------------+--------+----------+---------+
| name | The name of the graph | string | true | |
| implementation | The implementation of the serving | string | true | |
| modelUri | The model uri, you can use `pvc://pvc-name/path` or `s3://s3-name/path`, etc. | string | true | |
+----------------+-------------------------------------------------------------------------------+--------+----------+---------+
## customRouting
+-------------+-----------------------------------------------------------------------+--------+----------+---------+
| NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
+-------------+-----------------------------------------------------------------------+--------+----------+---------+
| header | Request with specified header will be routed to the specified service | string | true | |
| serviceName | The service name that will be routed to | string | true | |
+-------------+-----------------------------------------------------------------------+--------+----------+---------+
Last updated on 2022年11月1日 by Tianxin Dong