Metadata

Tracking and managing metadata of machine learning workflows in Kubeflow

The goal of the Metadata project is tohelp Kubeflow users understand and manage their machine learning (ML) workflowsby tracking and managing the metadata that the workflows produce.

In this context, metadata means information about executions (runs), models,datasets, and other artifacts. Artifacts are the files and objects that formthe inputs and outputs of the components in your ML workflow.

Alpha version

This is an alpha release of the Metadata API. The next version of Kubeflowmay introduce breaking changes. The development team is interested in anyfeedback you have while using the Metadata component, and in particular yourfeedback on any gaps in the functionality that the component offers.

Installing the Metadata component

Kubeflow v0.6.1 and later versions install the Metadata component by default.You can skip this section if you are running Kubeflow v0.6.1 or later.

If you want to install the latest version of the Metadata component or toinstall the component as an application in your Kubernetes cluster, follow thesesteps:

  • Download the Kubeflow manifests repository:
  1. git clone https://github.com/kubeflow/manifests
  • Run the following commands to deploy the services of the Metadata component:
  1. cd manifests/metadata/base
  2. kustomize build . | kubectl apply -n kubeflow -f -

Using the Metadata SDK to record metadata

The Metadata project publishes aPython library (SDK)that you can use to log (record) your metadata.

Run the following command to install the Metadata SDK:

  1. pip install kfmd

Try the Metadata SDK in a sample Jupyter notebook

You can find an example of how to use the Metadata SDK in thisdemo notebook.

To run the notebook in your Kubeflow cluster:

  • Follow the guide tosetting up your Jupyter notebooks in Kubeflow.
  • Go to the demo notebook onGitHub.
  • Download the notebook code by opening the Raw view of the file, thenright-clicking on the content and saving the file locally as demo.ipynb.
  • Go back to your Jupyter notebook server in the Kubeflow UI. (If you’vemoved away from the notebooks section in Kubeflow, clickNotebook Servers in the left-hand navigation panel to get back there.)
  • In the Jupyter notebook UI, click Upload and follow the prompts to uploadthe demo.ipynb notebook.
  • Click the notebook name (demo.ipynb) to open the notebook in your Kubeflowcluster.
  • Run the steps in the notebook to install and use the Metadata SDK. When you have finished running through the steps in the demo.ipynb notebook,you can view the resulting metadata on the Kubeflow UI:

  • Click Artifact Store in the left-hand navigation panel on the KubeflowUI.

  • On the Artifacts screen you should see the following items:

    • A model metadata item with the name MNIST.
    • A metrics metadata item with the name MNIST-evaluation.
    • A dataset metadata item with the name mytable-dump. You can click the name of each item to view the details. See the sectionbelow about the Metadata UI for more details.

Learn more about the Metadata SDK

The Metadata SDK includes the followingpredefined typesthat you can use to describe your ML workflows:

  • data_set.jsonto capture metadata for a dataset that forms the input into or the output ofa component in your workflow.
  • execution.jsonto capture metadata for an execution (run) of your ML workflow.
  • metrics.jsonto capture metadata for the metrics used to evaluate an ML model.
  • model.jsonto capture metadata for an ML model that your workflow produces.

Tracking artifacts on the Metadata UI

You can view a list of logged artifacts and the details of each individualartifact in the Artifact Store on the Kubeflow UI.

  • Go to Kubeflow in your browser. (If you haven’t yet opened theKubeflow UI, find out how to access theKubeflow UIs.)
  • Click Artifact Store in the left-hand navigation panel:Metadata UI

  • The Artifacts screen opens and displays a list of items for all themetadata events that your workflows have logged. You can click the name ofeach item to view the details.

The following examples show the items that appear when you run thedemo.ipynb notebook described above:

A list of metadata items

  • Example of model metadata with the name “MNIST”:

Model metadata for an example MNIST model

  • Example of metrics metadata with the name “MNIST-evaluation”:

Metrics metadata for an evaluation of an MNIST model

  • Example of dataset metadata with the name “mytable-dump”:

Dataset metadata

Backend and REST API

The Kubeflow metadata backend uses ML Metadata(MLMD)to manage the metadata and relationships.

The backend exposes aREST API.

You can add your own metadata types so that you can log metadata for customartifacts. To add a custom type, send a REST API request to theartifact_types endpoint.

For example, The following request registers an artifact type withname myorg/mytype/v1 and three properties:

  • f1 (string)
  • f2 (integer)
  • f3 (double)
  1. curl -X POST http://localhost:8080/api/v1alpha1/artifact_types \
  2. --header "Content-Type: application/json" -d \
  3. '{"name":"myorg/mytype/v1","properties":{"f1":"STRING", "f2":"INT", "f3": "DOUBLE"}}'

Next steps

Run thexgboost-synthetic notebookto build, train, and deploy an XGBoost model using Kubeflow Fairing and KubeflowPipelines with synthetic data. Examine the metadata output after runningthrough the steps in the notebook.