Submitter

A submitter is a pluggable module in SQLFlow that is used to submit an ML job to a third party computation service.

Workflow

When a user types in an extended SQL statement, SQLFlow first parses and semantically verifies the statement. Then SQLFlow either runs the ML job locally or submits the ML job to a third party computation service.

Submitter - 图1

In the latter case, SQLFlow produces a job description (TrainDescription or PredictDescription) and hands it over to the submitter. For a training SQL, SQLFlow produces TrainDescription; for prediction SQL, SQLFlow produces PredDescription. The concrete definition of the description looks like the following

  1. type ColumnType struct {
  2. Name string // e.g. sepal_length
  3. DatabaseTypeName string // e.g. FLOAT
  4. }
  5. // SELECT *
  6. // FROM iris.train
  7. // TRAIN DNNClassifier
  8. // WITH
  9. // n_classes = 3,
  10. // hidden_units = [10, 20]
  11. // COLUMN sepal_length, sepal_width, petal_length, petal_width
  12. // LABEL class
  13. // INTO sqlflow_models.my_dnn_model;
  14. type TrainDescription struct {
  15. StandardSelect string // e.g. SELECT * FROM iris.train
  16. Estimator string // e.g. DNNClassifier
  17. Attrs map[string]string // e.g. "n_classes": "3", "hidden_units": "[10, 20]"
  18. X []ColumnType // e.g. "sepal_length": "FLOAT", ...
  19. Y ColumnType // e.g. "class": "INT"
  20. ModelName string // e.g. my_dnn_model
  21. }
  22. // SELECT *
  23. // FROM iris.test
  24. // PREDICT iris.predict.class
  25. // USING sqlflow_models.my_dnn_model;
  26. type PredDescription struct {
  27. StandardSelect string // e.g. SELECT * FROM iris.test
  28. TableName string // e.g. iris.predict
  29. ModelName string // e.g. my_dnn_model
  30. }

Submitter Interface

The submitter interface should provide two functions Train and Predict. The detailed definition can be the following

  1. type Submitter interface {
  2. // Train executes a ML training job and streams job's response through writer.
  3. // A typical Train function should include
  4. // - Loading the training data
  5. // - Initializing the model
  6. // - model.train
  7. // - Saving the trained model to a persistent storage
  8. Train(desc TrainDescription, writer PipeWriter) error
  9. // Predict executes a ML predicting job and streams job's response through writer
  10. // A typical Predict function should include
  11. // - Loading the model from a persistent storage
  12. // - Loading the prediction data
  13. // - model.predict
  14. // - Writing the prediction result to a table
  15. Predict(desc PredictDescription, writer PipeWriter) error
  16. }

Register a submitter

A new submitter can be added as

  1. import (
  2. ".../my_submitter"
  3. ".../sqlflow/sql"
  4. )
  5. func main() {
  6. // ...
  7. sql.Register(my_submitter.NewSubmitter())
  8. // ...
  9. for {
  10. sql := recv()
  11. sql.Run(sql)
  12. }
  13. }

where sql.Register will put my_submitter instance to package level registry. During sql.Run, it will check whether there is a submitter registered. If there is, sql.Run will run either submitter.Train or submitter.Predict.