SQLFlow Playground Server
SQLFlow Playground Server exposes a REST API service that enables users to share the resources in one playground cluster. Users can take advantage of SQLFlow by installing a small plugin on her/his Jupyter Notebook.
This service is used to extend the SQLFlow Playground’s capability, especially when we need to manage the resource in the k8s cluster. We suppose the playground as a pure backend service (without Jupyter/JupyterHub) which provides machine learning capability for some frontend. It clearly is not for those who just want to connect to the SQLFlow server in the playground through our built-in Jupyter Notebook. Currently, this service is used to run our tutorials on Aliyun DSW for Developer which behaves as a frontend of our playground.
The Architecture
SQLFlow Playground Server is a side-car service of our playground cluster. Now, it is designed as an HTTP server which does user authorization, creates DB resource, and so on. This server uses kubectl
to manipulate the resource in the playground(a k8s cluster). It’s in someway the gateway of the playground. As described in the below diagram, the interaction of the three subjects could be: Clients ask the playground server for some resource. The server authorizes the client and create the resource on the playground. The client connects to the SQLFlow server in the playground and does train/predict tasks using the created resource.
----------------run task--------------------------->
| |
Clients <--> Playground Server <--> Playground[SQLFlow Server, MySQL Server...]
Supported API
Request URL path is composed by the prefix /api/
and the API name, like:
https://playground.sqlflow.tech/api/heart_beat
This service always uses HTTPS
and only accepts authorized clients by checking their certification file. So there is no dedicated API for user authentication.
Currently supported API are: | name | method | params | description | | - | - | - | - | | create_db | POST | {“user_id”: “id”} | create a DB for given user, json param | | heart_beat| GET | user_id=id | report a heart beat of given client |
How to Use
For Service Maintainer
The maintainer should provide the playground cluster, and bootup a SQLFlow Playground Server
. The server should have privillege to access the kubectl
command of the cluster. To install the server, maintainer can use below command:
mkdir $HOME/workspace
cd $HOME/workspace
pip install sqlflow_playground
mkdir key_store
gen_cert.sh server
sqlflow_playground --port=50052 \
--ca_crt=key_store/ca/ca.crt \
--server_key=key_store/server/server.key \
--server_crt=key_store/server/server.crt
In the above commands, we first installed the sqlflow playground package which carries the main cluster operation logic. Then, we use the key tool to generate a server certification file (Of course, it’s not necessary if you have your own certification files) which enables us to provide HTTPS
service. Finally, we start the REST API
service at port 50052.
Our playground service uses bi-directional validation. So, the maintainer needs to generate a certification file for a trusted user. Use below command and send the generated .crt
and .key
file together with the ca.crt
to the user.
gen_cert.sh some_client
For The User
To use this service, the user should get authorized from the playground’s maintainer. In detail, user should get ca.crt
, client.key
and the client.crt
file from the maintainer and keep them in some very-safe place. Also, the user should ask the maintainer for the sqlflow server address and the sqlflow playground server address. Then, the user will install Jupyter Notebook and the SQLFlow plugin package and do some configuration. Finally, the user can experience SQLFlow in his Jupyter Notebook.
pip3 install notebook sqlflow==0.15.0
cat >$HOME/.sqlflow_playground.env <<EOF
SQLFLOW_SERVER="{sqlflow server address}"
SQLFLOW_PLAYGROUND_USER_ID_ENV=SQLFLOW_USER_ID
SQLFLOW_USER_ID="{your name}"
SQLFLOW_PLAYGROUND_SERVRE="{sqlflow playground server address}"
SQLFLOW_PLAYGROUND_SERVER_CA="{path to your ca.crt file}"
SQLFLOW_PLAYGROUND_CLIENT_KEY="{path to your client.key file}"
SQLFLOW_PLAYGROUND_CLIENT_CERT="{path to your client.crt file}"
EOF
export SQLFLOW_JUPYTER_ENV_PATH="$HOME/.sqlflow_playground.env"
# start the notebook and try use %%sqlflow magic command
jupyter notebook
Implementation
We use tornado as the web framework which provides a very good request dispatching mechanism. By the way, this framework is also adopted by Jupyter Notebook. The request processing is split into two steps:
Register a request handler
tornado.web.Application([(r"/", MainHandler)])
Implement the handler as a class, the method name
get
imply it acceptsGET
requests.class MainHandler(RequestHandler):
def get(self):
self.write("hello SQLFlow!")
In addition, We add a k8s manipulate class, which can create resource in the cluster. It’s now implemented in a brutal way (use kubectl). We may refine it by using k8s’s API.