Description
Naive Bayes Classifier.
We support the multinomial Naive Bayes and multinomial Naive Bayes model, a probabilistic learning method. Here, feature values of train table must be nonnegative.
Details info of the algorithm: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html
Parameters
Name | Description | Type | Required? | Default Value |
---|---|---|---|---|
modelType | model type : Multinomial or Bernoulli. | String | “Multinomial” | |
featureCols | Names of the feature columns used for training in the input table | String[] | null | |
labelCol | Name of the label column in the input table | String | ✓ | |
weightCol | Name of the column indicating weight | String | null | |
vectorCol | Name of a vector column | String | null | |
smoothing | the smoothing factor | Double | 1.0 | |
vectorCol | Name of a vector column | String | null | |
predictionCol | Column name of prediction. | String | ✓ | |
predictionDetailCol | Column name of prediction result, it will include detailed info. | String | ||
reservedCols | Names of the columns to be retained in the output table | String[] | null |
Script Example
Script
data = np.array([
[1.0, 1.0, 0.0, 1.0, 1],
[1.0, 0.0, 1.0, 1.0, 1],
[1.0, 0.0, 1.0, 1.0, 1],
[0.0, 1.0, 1.0, 0.0, 0],
[0.0, 1.0, 1.0, 0.0, 0],
[0.0, 1.0, 1.0, 0.0, 0],
[0.0, 1.0, 1.0, 0.0, 0],
[1.0, 1.0, 1.0, 1.0, 1],
[0.0, 1.0, 1.0, 0.0, 0]])
df = pd.DataFrame({"f0": data[:, 0],
"f1": data[:, 1],
"f2": data[:, 2],
"f3": data[:, 3],
"label": data[:, 4]})
df["label"] = df["label"].astype('int')
batchData = dataframeToOperator(df, schemaStr='f0 double, f1 double, f2 double, f3 double, label int', op_type='batch')
# load data
colnames = ["f0","f1","f2", "f3"]
ns = NaiveBayes().setFeatureCols(colnames).setLabelCol("label").setPredictionCol("pred")
model = ns.fit(batchData)
model.transform(batchData).print()
Result
f0 | f1 | f2 | f3 | label | pred |
---|---|---|---|---|---|
1.0 | 1.0 | 0.0 | 1.0 | 1 | 1 |
1.0 | 0.0 | 1.0 | 1.0 | 1 | 1 |
1.0 | 0.0 | 1.0 | 1.0 | 1 | 1 |
0.0 | 1.0 | 1.0 | 0.0 | 0 | 0 |
0.0 | 1.0 | 1.0 | 0.0 | 0 | 0 |
0.0 | 1.0 | 1.0 | 0.0 | 0 | 0 |
0.0 | 1.0 | 1.0 | 0.0 | 0 | 0 |
1.0 | 1.0 | 1.0 | 1.0 | 1 | 1 |
0.0 | 1.0 | 1.0 | 0.0 | 0 | 0 |