Description

Naive Bayes Predictor.

We support the multinomial Naive Bayes and multinomial NB model, a probabilistic learning method. here, feature values of train table must be nonnegative.

Details info of the algorithm: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

Parameters

Name Description Type Required? Default Value
vectorCol Name of a vector column String null
predictionCol Column name of prediction. String
predictionDetailCol Column name of prediction result, it will include detailed info. String
reservedCols Names of the columns to be retained in the output table String[] null

Script Example

Script

  1. data = np.array([
  2. [1.0, 1.0, 0.0, 1.0, 1],
  3. [1.0, 0.0, 1.0, 1.0, 1],
  4. [1.0, 0.0, 1.0, 1.0, 1],
  5. [0.0, 1.0, 1.0, 0.0, 0],
  6. [0.0, 1.0, 1.0, 0.0, 0],
  7. [0.0, 1.0, 1.0, 0.0, 0],
  8. [0.0, 1.0, 1.0, 0.0, 0],
  9. [1.0, 1.0, 1.0, 1.0, 1],
  10. [0.0, 1.0, 1.0, 0.0, 0]])
  11. df = pd.DataFrame({"f0": data[:, 0],
  12. "f1": data[:, 1],
  13. "f2": data[:, 2],
  14. "f3": data[:, 3],
  15. "label": data[:, 4]})
  16. df["label"] = df["label"].astype('int')
  17. batchData = dataframeToOperator(df, schemaStr='f0 double, f1 double, f2 double, f3 double, label int', op_type='batch')
  18. streamData = dataframeToOperator(df, schemaStr='f0 double, f1 double, f2 double, f3 double, label int', op_type='stream')
  19. # load data
  20. colnames = ["f0","f1","f2", "f3"]
  21. ns = NaiveBayesTrainBatchOp().setFeatureCols(colnames).setLabelCol("label")
  22. model = batchData.link(ns)
  23. predictor = NaiveBayesPredictStreamOp(model).setPredictionCol("pred")
  24. predictor.linkFrom(streamData).print()

Result

f0 f1 f2 f3 label pred
1.0 1.0 0.0 1.0 1 1
1.0 0.0 1.0 1.0 1 1
1.0 0.0 1.0 1.0 1 1
0.0 1.0 1.0 0.0 0 0
0.0 1.0 1.0 0.0 0 0
0.0 1.0 1.0 0.0 0 0
0.0 1.0 1.0 0.0 0 0
1.0 1.0 1.0 1.0 1 1
0.0 1.0 1.0 0.0 0 0