Machine Learning

Machine learning addon is divided into model-training addon and model-serving addon. Run the following command to install the addon:

  1. vela addon enable model-training
  2. vela addon enable model-serving

In model-training addon, we have two component definitions: model-training and jupyter notebook.

  1. $ vela show model-training
  2. # Properties
  3. +------------------+----------------------------------------------------------------------------------+-------------------------------+----------+---------+
  4. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  5. +------------------+----------------------------------------------------------------------------------+-------------------------------+----------+---------+
  6. | env | Define arguments by using environment variables | [[]env](#env) | false | |
  7. | labels | Specify the labels in the workload | map[string]string | false | |
  8. | annotations | Specify the annotations in the workload | map[string]string | false | |
  9. | framework | The training framework to use | string | true | |
  10. | image | Which image would you like to use for your service | string | true | |
  11. | imagePullPolicy | Specify image pull policy for your service | string | false | |
  12. | cpu | Number of CPU units for the service, like `0.5` (0.5 CPU core), `1` (1 CPU core) | string | false | |
  13. | memory | Specifies the attributes of the memory resource required for the container. | string | false | |
  14. | gpu | Specifies the attributes of the gpu resource required for the container. | string | false | |
  15. | storage | | [[]storage](#storage) | false | |
  16. | imagePullSecrets | Specify image pull secrets for your service | []string | false | |
  17. | distribution | If you want to train the model in distributed mode, specify here | [distribution](#distribution) | false | |
  18. | restartPolicy | | string | true | Never |
  19. +------------------+----------------------------------------------------------------------------------+-------------------------------+----------+---------+
  20. ## distribution
  21. +-----------+---------------------------------------------------------+------+----------+---------+
  22. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  23. +-----------+---------------------------------------------------------+------+----------+---------+
  24. | ps | The number of PS replicas, suits for tensorflow model | int | false | |
  25. | master | The number of Master replicas, suits for pytorch model | int | false | |
  26. | scheduler | The number of Scheduler replicas, suits for mxnet model | int | false | |
  27. | server | The number of Server replicas, suits for mxnet model | int | false | |
  28. | launcher | The number of Launcher replicas, suits for mpi model | int | false | |
  29. | worker | The number of Worker replicas | int | false | |
  30. +-----------+---------------------------------------------------------+------+----------+---------+
  31. ## storage
  32. +------------------+--------------------------------------------------------------------------+---------------------------------+----------+------------+
  33. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  34. +------------------+--------------------------------------------------------------------------+---------------------------------+----------+------------+
  35. | name | | string | false | |
  36. | resources | | [resources](#resources) | false | |
  37. | pvcRef | If you want to use a existed PVC, specify the PVC name and moutPath here | [pvcRef](#pvcRef) | false | |
  38. | mountPath | | string | false | |
  39. | accessModes | | [...] | true | |
  40. | volumeMode | | string | true | Filesystem |
  41. | storageClassName | | string | false | |
  42. | dataSourceRef | | [dataSourceRef](#dataSourceRef) | false | |
  43. | dataSource | | [dataSource](#dataSource) | false | |
  44. | selector | | [selector](#selector) | false | |
  45. +------------------+--------------------------------------------------------------------------+---------------------------------+----------+------------+
  1. $ vela show jupyter-notebook
  2. # Properties
  3. +-------------+------------------------------------------------------------------------------------------------------+-----------------------+----------+-----------+
  4. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  5. +-------------+------------------------------------------------------------------------------------------------------+-----------------------+----------+-----------+
  6. | cpu | Number of CPU units for the service, like `0.5` (0.5 CPU core), `1` (1 CPU core) | string | false | |
  7. | memory | Specifies the attributes of the memory resource required for the container. | string | false | |
  8. | gpu | Specifies the attributes of the gpu resource required for the container. | string | false | |
  9. | storage | | [[]storage](#storage) | false | |
  10. | serviceType | Specify what kind of Service you want. options: "ClusterIP","NodePort","LoadBalancer","ExternalName" | string | true | ClusterIP |
  11. +-------------+------------------------------------------------------------------------------------------------------+-----------------------+----------+-----------+
  12. ## storage
  13. +-----------+-------------+--------+----------+---------+
  14. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  15. +-----------+-------------+--------+----------+---------+
  16. | name | | string | true | |
  17. | mountPath | | string | true | |
  18. +-----------+-------------+--------+----------+---------+

In model-serving addon, we have on component definition called model-serving to serve the models.

  1. $ vela show model-serving
  2. # Properties
  3. +---------------+-----------------------------------------------------------------------+---------------------------------+----------+---------+
  4. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  5. +---------------+-----------------------------------------------------------------------+---------------------------------+----------+---------+
  6. | timeout | If you model serving need long time to return, please set the timeout | string | false | |
  7. | customRouting | Specify the custom routing of the serving | [customRouting](#customRouting) | false | |
  8. | protocol | Protocol of model serving, default to seldon | string | false | |
  9. | predictors | The predictors of the serving | [[]predictors](#predictors) | true | |
  10. +---------------+-----------------------------------------------------------------------+---------------------------------+----------+---------+
  11. ## predictors
  12. +------------+------------------------------------------------------------------------------------+---------------------------+----------+---------+
  13. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  14. +------------+------------------------------------------------------------------------------------+---------------------------+----------+---------+
  15. | name | Name of the predictor | string | true | |
  16. | replicas | Replica of the predictor | int | false | |
  17. | traffic | If you want to split the traffic to different serving, please set the traffic here | int | false | |
  18. | graph | The graph of the predictor | [graph](#graph) | true | |
  19. | resources | The resources of the serving | [resources](#resources) | false | |
  20. | autoscaler | The autoscaler of the serving | [autoscaler](#autoscaler) | false | |
  21. +------------+------------------------------------------------------------------------------------+---------------------------+----------+---------+
  22. ### autoscaler
  23. +-------------+--------------------------------------+-----------------------+----------+---------+
  24. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  25. +-------------+--------------------------------------+-----------------------+----------+---------+
  26. | minReplicas | The min replicas of this auto scaler | int | true | |
  27. | maxReplicas | The max replicas of this auto scaler | int | true | |
  28. | metrics | The metrics of this auto scaler | [[]metrics](#metrics) | true | |
  29. +-------------+--------------------------------------+-----------------------+----------+---------+
  30. #### metrics
  31. +--------------------------+----------------------------------------------------+--------+----------+---------+
  32. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  33. +--------------------------+----------------------------------------------------+--------+----------+---------+
  34. | type | The type of this auto scaler | string | true | |
  35. | targetAverageUtilization | The target average utilization of this auto scaler | int | true | |
  36. +--------------------------+----------------------------------------------------+--------+----------+---------+
  37. ### resources
  38. +--------+----------------------------------------------------------------------------------+--------+----------+---------+
  39. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  40. +--------+----------------------------------------------------------------------------------+--------+----------+---------+
  41. | cpu | Number of CPU units for the service, like `0.5` (0.5 CPU core), `1` (1 CPU core) | string | false | |
  42. | memory | Specifies the attributes of the memory resource required for the container. | string | false | |
  43. | gpu | Specifies the attributes of the gpu resource required for the container. | string | false | |
  44. +--------+----------------------------------------------------------------------------------+--------+----------+---------+
  45. ### graph
  46. +----------------+-------------------------------------------------------------------------------+--------+----------+---------+
  47. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  48. +----------------+-------------------------------------------------------------------------------+--------+----------+---------+
  49. | name | The name of the graph | string | true | |
  50. | implementation | The implementation of the serving | string | true | |
  51. | modelUri | The model uri, you can use `pvc://pvc-name/path` or `s3://s3-name/path`, etc. | string | true | |
  52. +----------------+-------------------------------------------------------------------------------+--------+----------+---------+
  53. ## customRouting
  54. +-------------+-----------------------------------------------------------------------+--------+----------+---------+
  55. | NAME | DESCRIPTION | TYPE | REQUIRED | DEFAULT |
  56. +-------------+-----------------------------------------------------------------------+--------+----------+---------+
  57. | header | Request with specified header will be routed to the specified service | string | true | |
  58. | serviceName | The service name that will be routed to | string | true | |
  59. +-------------+-----------------------------------------------------------------------+--------+----------+---------+

Last updated on Aug 4, 2023 by Daniel Higuero