A parallel PrefixSpan algorithm to mine frequent sequential patterns. spark.findFrequentSequentialPatterns
returns a complete set of frequent sequential patterns. For more details, see PrefixSpan.
spark.findFrequentSequentialPatterns(data, ...)
# S4 method for class 'SparkDataFrame'
spark.findFrequentSequentialPatterns(
data,
minSupport = 0.1,
maxPatternLength = 10L,
maxLocalProjDBSize = 32000000L,
sequenceCol = "sequence"
)
Arguments
A SparkDataFrame.
additional argument(s) passed to the method.
Minimal support level.
Maximal pattern length.
Maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing.
name of the sequence column in dataset.
A complete set of frequent sequential patterns in the input sequences of itemsets. The returned SparkDataFrame
contains columns of sequence and corresponding frequency. The schema of it will be: sequence: ArrayType(ArrayType(T))
, freq: integer
where T is the item type
spark.findFrequentSequentialPatterns(SparkDataFrame) since 3.0.0
Examplesif (FALSE) { # \dontrun{
df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
list(list(list(1L), list(3L, 2L), list(1L, 2L))),
list(list(list(1L, 2L), list(5L))),
list(list(list(6L)))),
schema = c("sequence"))
frequency <- spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L,
maxLocalProjDBSize = 32000000L)
showDF(frequency)
} # }
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4