Association rule - PrefixSpan(batch) - 《Alink v1.0.1 Document》

Description
Parameters
Script Example
- Code
- Results

Description

PrefixSpan algorithm is used to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

Parameters

Name	Description	Type	Required？	Default Value
itemsCol	Column name of transaction items	String	✓
minSupportCount	Minimum support count	Integer		-1
minSupportPercent	Minimum support percent	Double		0.02
minConfidence	Minimum confidence	Double		0.05
maxPatternLength	Maximum frequent pattern length	Integer		10

Script Example

Code

data = np.array([
    ["a;a,b,c;a,c;d;c,f"],
    ["a,d;c;b,c;a,e"],
    ["e,f;a,b;d,f;c;b"],
    ["e;g;a,f;c;b;c"],
])
df_data = pd.DataFrame({
    "sequence": data[:, 0],
})
data = dataframeToOperator(df_data, schemaStr='sequence string', op_type='batch')
prefixSpan = PrefixSpanBatchOp() \
    .setItemsCol("sequence") \
    .setMinSupportCount(3)
prefixSpan.linkFrom(data)
prefixSpan.print()
prefixSpan.getSideOutput(0).print()

输入说明：一个sequence由多个element组成，element之间用分号分隔；一个element由多个item组成，item间用逗号分隔。

Results

Output

    itemset  supportcount  itemcount
0        a             4          1
1      a;c             4          2
2    a;c;c             3          3
3    a;c;b             3          3
4      a;b             4          2
5        b             4          1
6      b;c             3          2
7        c             4          1
8      c;c             3          2
9      c;b             3          2
10       d             3          1
11     d;c             3          2
12       e             3          1
13       f             3          1

Output

     rule  chain_length  support  confidence  transaction_count
0    a=>c             2     1.00        1.00                  4
1  a;c=>c             3     0.75        0.75                  3
2  a;c=>b             3     0.75        0.75                  3
3    a=>b             2     1.00        1.00                  4
4    b=>c             2     0.75        0.75                  3
5    c=>c             2     0.75        0.75                  3
6    c=>b             2     0.75        0.75                  3
7    d=>c             2     0.75        1.00                  3