MLeap 序列化

MLeap 中序列化和反序列化都非常简单。你可以选择序列化 MLeap Bundle 到文件系统中的一个目录,或者是序列化为一个 Zip 压缩包以便用于后期分发。

创建一个简单的 MLeap Pipeline

  1. import ml.combust.bundle.BundleFile
  2. import ml.combust.bundle.serializer.SerializationFormat
  3. import ml.combust.mleap.core.feature.{OneHotEncoderModel, StringIndexerModel}
  4. import ml.combust.mleap.core.regression.LinearRegressionModel
  5. import ml.combust.mleap.runtime.transformer.Pipeline
  6. import ml.combust.mleap.runtime.transformer.feature.{OneHotEncoder, StringIndexer, VectorAssembler}
  7. import ml.combust.mleap.runtime.transformer.regression.LinearRegression
  8. import org.apache.spark.ml.linalg.Vectors
  9. import ml.combust.mleap.runtime.MleapSupport._
  10. import resource._
  11. // Create a sample pipeline that we will serialize
  12. // And then deserialize using various formats
  13. val stringIndexer = StringIndexer(
  14. shape = NodeShape.scalar(inputCol = "a_string", outputCol = "a_string_index"),
  15. model = StringIndexerModel(Seq("Hello, MLeap!", "Another row")))
  16. val oneHotEncoder = OneHotEncoder(
  17. shape = NodeShape.vector(1, 2, inputCol = "a_string_index", outputCol = "a_string_oh"),
  18. model = OneHotEncoderModel(2, dropLast = false))
  19. val featureAssembler = VectorAssembler(
  20. shape = NodeShape().withInput("input0", "a_string_oh").
  21. withInput("input1", "a_double").withStandardOutput("features"),
  22. model = VectorAssemblerModel(Seq(TensorShape(2), ScalarShape())))
  23. val linearRegression = LinearRegression(
  24. shape = NodeShape.regression(3),
  25. model = LinearRegressionModel(Vectors.dense(2.0, 3.0, 6.0), 23.5))
  26. val pipeline = Pipeline(
  27. shape = NodeShape(),
  28. model = PipelineModel(Seq(stringIndexer, oneHotEncoder, featureAssembler, linearRegression)))

序列化为 Zip 文件

In order to serialize to a zip file, make sure the URI begins with jar:file and ends with a .zip.

为了序列化为 Zip 文件,需要确保 URL 以 jar:file 开头,以 .zip 结尾。

For example jar:file:/tmp/mleap-bundle.zip.

例如: jar:file:/tmp/mleap-bundle.zip

JSON 格式

  1. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
  3. }

Protobuf 格式

  1. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-protobuf.zip"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
  3. }

序列化为目录

为了序列化为目录,需要确保 URL 以 file 开头。

例如: file:/tmp/mleap-bundle-dir

JSON 格式

  1. for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
  3. }

Protobuf 格式

  1. for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-protobuf-dir"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
  3. }

反序列化

反序列化和序列化一样简单,你无需事先知道 MLeap Bundle 的序列化格式,唯一需要了解的,是这个包的路径。

反序列化 Zip Bundle

  1. // Deserialize a zip bundle
  2. // Use Scala ARM to make sure resources are managed properly
  3. val zipBundle = (for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) yield {
  4. bundle.loadMleapBundle().get
  5. }).opt.get

反序列化目录 Bundle

  1. // Deserialize a directory bundle
  2. // Use Scala ARM to make sure resources are managed properly
  3. val dirBundle = (for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) yield {
  4. bundle.loadMleapBundle().get
  5. }).opt.get