Show Train Statement in SQLFlow

Background

Like using other SQL engines, people may want to access the table metadata (say, the CREATE TABLE statement) using a SQL query. Usually they will type a statement like:

  1. SHOW CREATE TABLE `test`;

which will produce:

  1. mysql> SHOW CREATE TABLE `test`;
  2. +-------+------------------------------------------------------------------+
  3. | Table | Create Table
  4. +-------+------------------------------------------------------------------+
  5. | test | CREATE TABLE `test` (
  6. `sepal_length` float DEFAULT NULL,
  7. `sepal_width` float DEFAULT NULL,
  8. `petal_length` float DEFAULT NULL,
  9. `petal_width` float DEFAULT NULL,
  10. `class` int DEFAULT NULL
  11. ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
  12. +-------+-----------------------------------------------------------------+

It worth for SQLFlow to do the same thing. On SQLFlow platform, user may create a lot of models with a minor set of parameters different with each other. After a while, the user may hardly tell what the differences are if he hadn’t record the information else where (like in model name). We would like to provide a formal way to solve this problem. Here we introduce the SHOW TRAIN feature.

Syntax

  1. SHOW TRAIN table_where_model_stored ;

where

  • table_where_model_stored must be a table which stores the trained model in MySQL or Hive. It should be a qualified table name like {db_name}.{table_name} if you are not in a DB selected context.

This query will show the original train sql statement like:

  1. +-------+------------------------------------------------------------------+
  2. | TABLE | TRAIN STATEMENT
  3. +-------+------------------------------------------------------------------+
  4. | test | SELECT * FROM train to TRAIN xgboost.gbtree with learning_rate=0.4,
  5. objective=multi:softmax, num_class=3 LABEL class INTO test;
  6. +-------+------------------------------------------------------------------+

In the future, we may support to show more metadata other than the original training sql statement.

Implementation

  • Extend the SQLFlow parser with our SHOW TRAIN statement. First, we need to add a key word SHOW to our extended syntax. In addition, SHOW TRAIN is not like our train/predict/explain statements which all share a SELECT ... TO ... format in which there is a standard SELECT ... part at the front and an extended TO ... part at the end. With this definition, our extending statement has no standard part. So, we have to modify the parse process slightly. The pseudo code is like below:

    1. func Parse(program string) ([]*SQLFlowStmt, error) {
    2. all := make([]*SQLFlowStmt)
    3. for program != "" {
    4. standard, err := tryExtractStandardPart(program)
    5. if err == nil {
    6. program = eatConsumedPrefix(program)
    7. if program == "" {
    8. // standard sql only
    9. all = append(all, standard)
    10. } else {
    11. // extract and verify the extended 'TO ...' part
    12. extended, err: = tryParseExtendedSyntax(program)
    13. if err != nil {
    14. return nil, err
    15. }
    16. if isValidToExtended(merged)
    17. merged := merge(standard, extended)
    18. all = append(all, merged)
    19. } else {
    20. return nil, err
    21. }
    22. }
    23. } else if extended, e := tryParseExtendedSyntax(program); e == nil {
    24. // Maybe it is a pure extended stmt with no standard part (say, `SHOW TRAIN...`), so try to use the extended parser
    25. // In addition, the program should not contains a 'TO...' only, although our extended parser accepts this form
    26. if isValidPureExtended(extended) {
    27. all = append(all, extended)
    28. } else {
    29. return nil, err
    30. }
    31. } else {
    32. return nil, err
    33. }
    34. program = eatConsumedPrefix(program)
    35. }
    36. return all, nil
    37. }
  • Branch to execute a ShowTrain cmd in executor, this cmd will first read the model on a sqlfs and then extract the original train statement. We can not get the train statement from models saved on OSS because they have not been saved with the models yet. This can be implemented next time.

  • Format and route the cmd’s output to the given output buffer

Discussion

I noticed model zoo has created a table to store model meta in MySQL, shall we just use this mechanism to get the meta or extract from the saved model?

  • No, this table will be deleted in the future