Ftrl 在线预测

算法介绍

实时更新ftrl 训练得到的模型流,并使用实时的模型对实时的数据进行预测。

参数说明

名称 中文名称 描述 类型 是否必须? 默认值
vectorCol 向量列名 向量列对应的列名,默认值是null String null
reservedCols 算法保留列名 算法保留列 String[] null
predictionCol 预测结果列名 预测结果列名 String
predictionDetailCol 预测详细信息列名 预测详细信息列名 String

脚本示例

运行脚本

  1. data = np.array([
  2. [2, 1, 1],
  3. [3, 2, 1],
  4. [4, 3, 2],
  5. [2, 4, 1],
  6. [2, 2, 1],
  7. [4, 3, 2],
  8. [1, 2, 1],
  9. [5, 3, 2]])
  10. df = pd.DataFrame({"f0": data[:, 0],
  11. "f1": data[:, 1],
  12. "label": data[:, 2]})
  13. batchData = dataframeToOperator(df, schemaStr='f0 int, f1 int, label int', op_type='batch')
  14. streamData = dataframeToOperator(df, schemaStr='f0 int, f1 int, label int', op_type='stream')
  15. model = LogisticRegressionTrainBatchOp() \
  16. .setFeatureCols(["f0", "f1"]) \
  17. .setLabelCol("label") \
  18. .setMaxIter(5).linkFrom(batchData);
  19. models = FtrlTrainStreamOp(model) \
  20. .setFeatureCols(["f0", "f1"]) \
  21. .setLabelCol("label") \
  22. .setTimeInterval(1) \
  23. .setAlpha(0.1) \
  24. .setBeta(0.1) \
  25. .setL1(0.1) \
  26. .setL2(0.1).setVectorSize(2).setWithIntercept(True) \
  27. .linkFrom(streamData);
  28. FtrlPredictStreamOp(model) \
  29. .setPredictionCol("pred") \
  30. .setReservedCols(["label"]) \
  31. .setPredictionDetailCol("details") \
  32. .linkFrom(models, streamData).print()
  33. StreamOperator.execute()

运行结果

  1. label pred details
  2. 1 1 {"1":"0.9999917437501057","2":"8.2562498943117...
  3. 1 1 {"1":"0.965917838185468","2":"0.03408216181453...
  4. 2 2 {"1":"0.00658782416074899","2":"0.993412175839...
  5. 1 1 {"1":"0.9810760570397847","2":"0.0189239429602...
  6. 1 1 {"1":"0.9998904582473768","2":"1.0954175262323...
  7. 2 2 {"1":"0.00658782416074899","2":"0.993412175839...
  8. 1 1 {"1":"0.9999996598523875","2":"3.4014761252088...
  9. 2 2 {"1":"2.0589409516880153E-5","2":"0.9999794105...

```