朴素贝叶斯文本分类流预测
功能介绍
- 朴素贝叶斯文本分类是一个多分类算法
- 朴素贝叶斯文本分类组件支持稀疏、稠密两种数据格式
- 朴素贝叶斯文本分类组件支持带样本权重的训练
参数说明
名称 |
中文名称 |
描述 |
类型 |
是否必须? |
默认值 |
vectorCol |
向量列名 |
向量列对应的列名 |
String |
✓ |
|
predictionCol |
预测结果列名 |
预测结果列名 |
String |
✓ |
|
predictionDetailCol |
预测详细信息列名 |
预测详细信息列名 |
String |
|
|
reservedCols |
算法保留列名 |
算法保留列 |
String[] |
|
null |
脚本示例
运行脚本
data = np.array([
["$31$0:1.0 1:1.0 2:1.0 30:1.0","1.0 1.0 1.0 1.0", '1'],
["$31$0:1.0 1:1.0 2:0.0 30:1.0","1.0 1.0 0.0 1.0", '1'],
["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
["$31$0:1.0 1:0.0 2:1.0 30:1.0","1.0 0.0 1.0 1.0", '1'],
["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0'],
["$31$0:0.0 1:1.0 2:1.0 30:0.0","0.0 1.0 1.0 0.0", '0']])
dataSchema = ["sv", "dv", "label"]
df = pd.DataFrame({"sv": data[:, 0], "dv": data[:, 1], "label": data[:, 2]})
batchData = dataframeToOperator(df, schemaStr='sv string, dv string, label string', op_type='batch')
streamData = dataframeToOperator(df, schemaStr='sv string, dv string, label string', op_type='stream')
ns = NaiveBayesTextTrainBatchOp().setVectorCol("sv").setLabelCol("label")
model = batchData.link(ns)
predictor = NaiveBayesTextPredictStreamOp(model).setVectorCol("sv").setReservedCols(["sv", "label"]).setPredictionCol("pred")
predictor.linkFrom(streamData).print()
StreamOperator.execute()
运行结果
sv |
label |
pred |
“$31$0:1.0 1:1.0 2:1.0 30:1.0” |
1 |
1 |
“$31$0:1.0 1:1.0 2:0.0 30:1.0” |
1 |
1 |
“$31$0:1.0 1:0.0 2:1.0 30:1.0” |
1 |
1 |
“$31$0:1.0 1:0.0 2:1.0 30:1.0” |
1 |
1 |
“$31$0:0.0 1:1.0 2:1.0 30:0.0” |
0 |
0 |
“$31$0:0.0 1:1.0 2:1.0 30:0.0” |
0 |
0 |
“$31$0:0.0 1:1.0 2:1.0 30:0.0” |
0 |
0 |