Experiments
Experiments are run on a Kubernetes cluster. Commands must be run from a machine with access to the clusterthat can run kubectl
and kn
.
Everything must be cleared away between runs to make sure stuff doesn't bleed across.
Client Machine Set-up
To run throughput/ latency experiments you'll need to set up the client machine with (on the machine itself):
- cd ansible
- ansible-playbook load_client.yml
Billing Estimates
To get resource measurements from the hosts running experiments we first need an inventory file atansible/inventory/billing.yml
, something like:
- [all]
- myhost1
- myhost2
- ...
Then we can run the set-up with:
- cd ansible
- ansible-playbook -i inventory/billing.yml billing_setup.yml
Data
Data should be generated and uploaded ahead of time.
SGD
For details of the SGD experiment data see sgd.md
notes.
Matrices
The matrix experiment data needs to be generated in bulk locally, uploaded to S3 then downloaded on the client machine (or directly copied with scp
). You must have the native tooling and pyfaasm installed to generate it up front (butthis doesn't need to be done if it's already in S3):
- # Generate it
- inv libs.native
- inv matrix-data.generate-all
- # Direct SCP from local machine
- export HOST=<your_host>
- export HOST_USER=<user_on_your_host>
- scp -r ~/faasm/data/matrix $HOST_USER@$HOST:/home/$HOST_USER/faasm/data
- # Upload (note - >4GB)
- inv data.matrix-upload-s3
- # Download
- inv data.matrix-download-s3
Tensorflow
Tensorflow data consists of the model and images. These need to beuploaded to your Faasm instance:
- inv data.tf-upload data.tf-state
SGD Experiment
- # -- Prepare --
- # Upload data (one off)
- inv data.reuters-state
- # -- Build/ upload --
- inv knative.build-native sgd reuters_svm
- inv upload sgd reuters_svm
- # -- Deploy --
- # Vary number of workers on each run
- export N_WORKERS=10
- # Native containers
- inv knative.deploy-native sgd reuters_svm $N_WORKERS
- # Wasm
- inv knative.deploy $N_WORKERS
- # -- Wait --
- watch kn -n faasm service list
- watch kubectl -n faasm get pods
- # -- Run experiment --
- # Native SGD
- inv experiments.sgd --native $N_WORKERS 60000
- # Wasm SGD
- inv experiments.sgd $N_WORKERS 60000
- # -- Clean up --
- # Native SGD
- inv knative.delete-native sgd reuters_svm
- # Wasm
- inv knative.delete-worker --hard
Matrices Experiment
- # -- Build/ Upload --
- inv upload python mat_mul --py
- # Number of workers kept the same throughout
- export N_WORKERS=<number of workers>
- # -- Deploy --
- # Native
- inv knative.deploy-native-python $N_WORKERS
- # Wasm
- inv knative.deploy $N_WORKERS
- # -- Run experiment --
- # Native
- inv experiments.matrix-multi $N_WORKERS --native
- # Wasm
- inv experiments.matrix-multi $N_WORKERS
Tensorflow Experiment
You need to set the following environment variables for these experiments (through the knative config):
COLD_START_DELAY_MS=800
TF_CODEGEN=on
SGD_CODEGEN=off
PYTHON_CODEGEN=off
PYTHON_PRELOAD=off
Preamble:
- # -- Build/ upload --
- inv knative.build-native tf image
- inv upload tf image
- # -- Upload data (one-off)
- inv data.tf-upload data.tf-state
Latency:
- # -- Deploy both (note small number of workers) --
- inv knative.deploy-native tf image 1
- inv knative.deploy 1
- # -- Run experiment --
- inv experiments.tf-lat
Throughput:
- # -- Deploy --
- # Native
- inv knative.deploy-native tf image 30
- # Wasm
- inv knative.deploy 18
- # -- Run experiment --
- # Native
- inv experiments.tf-tpt --native
- # Wasm latency
- inv experiments.tf-tpt
Results
Once you've done several runs, you need to pull the results to your local machine and process:
- # SGD
- inv experiments.sgd-pull-results <user> <host>
- # Matrices
- inv experiments.matrix-pull-results <user> <host>
- # Inference latency
- inv experiments.tf-lat-pull-results <user> <host>
- # Inference throughput
- inv experiments.tf-tpt-pull-results <user> <host>