What is MLeap?
MLeap is a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle. Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.
Why MLeap?
Many companies that use Spark and Scikit-learn have a difficult timedeploying their research ML/data pipelines models to production API services. Even using Tensorflowcan be difficult to set these services up if a company does not wish touse Python in their API stack or does not use Google ML Cloud. MLeapprovides simple interfaces to execute entire ML pipelines, fromfeature transformers to classifiers, regressions, clustering algorithms,and neural networks.
Portable Models
Your models are your models. Take them with you wherever you go usingMLeap Bundles. Platforms like Microsoft Azure and Google ML can lockyou into their services package. MLeap allows you to take your modelswith you wherever you go.
Spark, Scikit-learn and Tensorflow: One Runtime
Mixing and matching ML technologies becomes a simple task. Instead of requiringan entire team of developers to make research pipelines production ready,simply export to an MLeap Bundle and run your pipeline wherever it isneeded.
Other benefits of a unified runtime:
- Train different pieces of your pipeline using Spark,Scikit-learn or Tensorflow, then export them to one MLeap Bundle fileand deploy it anywhere
- If you’re using Scikit for R&D, but Spark comes out with a better algorithm,you can export your Scikit ML pipeline to Spark, train the new model in Sparkand then deploy to production using the MLeap runtime
Common Serialization
In addition to providing a useful execution engine, MLeap Bundlesprovide a common serialization format for a large set of ML featureextractors and algorithms that are able to be exported and importedacross Spark, Scikit-learn, Tensorflow and MLeap. This means you caneasily convert pipelines between these technologies depending on whereyou need to execute a pipeline.
Seamless Integrations
For the most part, we do not modify any internal code or require customimplementations of transformers in any Spark or Scikit-learn. ForTensorflow, we use as many builtin ops as we can and implement customops for MLeap when they do not exist. This means that code changes toyour existing pipelines are minimal to get up and running with MLeap.For many use cases, no changes will be required and you can simplyexport to an MLeap Bundle or deploy to a Combust API server to startgetting immediate use of your pipeline.
Open Source
MLeap is entirely open source. Our source code is available athttps://github.com/combust/mleap. We also automateour tests and deploys with travis ci.