A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/nikolett0203/RulesTools below:

nikolett0203/RulesTools: RulesTools is an R package designed to streamline association rule mining workflows with functions for data preprocessing, analysis, and visualization.

RulesTools: Tools for Preparing, Analyzing, and Visualizing Association Rules

RulesTools is an R package designed to streamline association rule mining workflows. It provides functions for preparing datasets, analyzing generated rules, and visualizing results using heatmaps and Euler diagrams.

dtize_col Function: Discretize a Numeric Column

The dtize_col function discretizes a numeric vector into categories based on specified cutoff points. It supports predefined cutoffs (such as the mean or median), handles missing values, and allows for infinite bounds. This is useful for transforming continuous data into categorical intervals for association rule mining.

A vector with the same length as column, where each value is categorized based on the specified cutoffs.

  1. Validation: Ensures inputs are valid, including logical parameters, cutoff points, and labels.
  2. Cutoff Handling: Uses specified cutoffs or calculates cutoffs based on the mean or median.
  3. Interval Assignment: Categorizes values based on the cutoffs and labels.
  4. Missing Value Imputation: Optionally fills NA values with the mean or median before discretization.
data(BrookTrout)

# Example with predefined cutoffs
discrete_conc <- dtize_col(
  BrookTrout$eDNAConc,
  cutoff = 13.3,
  labels = c("low", "high"),
  infinity = TRUE
)

# Example with median as cutoff
discrete_pH <- dtize_col(BrookTrout$pH, cutoff = "median")

# Example with missing value imputation
filled_col <- dtize_col(
  c(1, 2, NA, 4, 5),
  cutoff = "mean",
  include_right = FALSE,
  na_fill = "mean"
)
dtize_df Function: Discretize Dataframe Columns

The dtize_df function discretizes numeric columns in a dataframe based on specified splitting criteria. It also handles missing values using various imputation methods, making it useful for preparing data for association rule mining.

A dataframe with numeric columns discretized and missing values handled based on the specified imputation method.

  1. Validation: Checks that the input is a valid dataframe.
  2. Missing Value Imputation: Handles missing values using the specified na_fill method, including predictive mean matching (pmm) via the mice package.
  3. Column Discretization: Discretizes each numeric column based on the specified cutoff and labels.
  4. Non-Numeric Handling: Non-numeric columns are converted to factors.
data(BrookTrout)

# Example with median as cutoff
med_df <- dtize_df(
  BrookTrout, 
  cutoff = "median", 
  labels = c("below median", "above median")
)

# Example with mean as cutoff and left-closed intervals
mean_df <- dtize_df(
  BrookTrout, 
  cutoff = "mean", 
  include_right = FALSE
)

# Example with missing value imputation using predictive mean matching (pmm)
air <- dtize_df(
  airquality, 
  cutoff = "mean", 
  na_fill = "pmm", 
  m = 10, 
  maxit = 10, 
  seed = 42
)
compare_rules Function: Compare and Find Intersections of Association Rule Sets

The compare_rules function helps you compare multiple sets of association rules, identify their intersections, and optionally save the results to a CSV file. This function is particularly useful for exploring how rule sets generated under different parameters overlap or differ.

A list containing the intersections of the provided rule sets.

  1. Input Rule Sets: Pass multiple named rule sets to the function.
  2. Validation: Ensures that inputs are valid rule sets and that parameters are correctly specified.
  3. Intersection Calculation: Finds intersections between all combinations of the rule sets.
  4. Output: Displays the results in the console and/or saves them to a CSV file.
library(arules)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "mean")

# Generate the first set of rules with a confidence threshold of 0.5
rules1 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

# Generate the second set of rules with a higher confidence threshold of 0.6
rules2 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.6, target = "rules")
)

# Compare the two sets of rules and display the intersections
compare_rules(
  r1 = rules1, 
  r2 = rules2, 
  display = TRUE, 
  filename = "intersections.csv"
)

# The intersections are saved in 'intersections.csv'
rule_euler Function: Create an Euler Diagram for Association Rules

The rule_euler function generates an Euler diagram visualization for up to 4 sets of association rules. It helps display the relationships and overlaps between rule sets, with customizable options for colors, transparency, and labels.

A plot object displaying the Euler diagram visualization.

  1. Validation: Checks that the input is a valid list of 2 to 4 rules objects.
  2. Customization: Allows setting custom colors, transparency, and labels for the diagram.
  3. Plot Generation: Uses the eulerr package to generate and display the Euler diagram.
library(arules)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "median")

# Generate the first set of rules with a confidence threshold of 0.5
rules1 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules")
)

# Generate the second set of rules with a higher confidence threshold of 0.6
rules2 <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.6, target = "rules")
)

# Create an Euler diagram to visualize the intersections between the rule sets
rule_euler(
  rules = list(conf0.5 = rules1, conf0.6 = rules2),
  title = "Euler Diagram of BrookTrout Rule Sets",
  fill_color = c("#7832ff", "lightgreen"),
  stroke_color = "darkblue"
)
rule_heatmap Function: Create a Heatmap for Association Rules

The rule_heatmap function generates a heatmap visualization of association rules, showing the relationships between antecedents and consequents based on a specified metric. This visualization helps identify patterns and strengths of associations in the rule set.

A ggplot object representing the heatmap visualization of the association rules.

  1. Validation: Ensures the input is a valid rules object and parameters are correctly specified.
  2. Data Preparation: Extracts antecedents, consequents, and the specified metric from the rule set.
  3. Optional Zero Inclusion: Fills missing combinations with zeros if include_zero = TRUE.
  4. Plot Generation: Uses ggplot2 to create a heatmap with a gradient color scale based on the chosen metric.
library(arules)
library(tidyr)
data(BrookTrout)

# Discretize the BrookTrout dataset
discrete_bt <- dtize_df(BrookTrout, cutoff = "median")

# Generate rules with a confidence threshold of 0.5
rules <- apriori(
  discrete_bt,
  parameter = list(supp = 0.01, conf = 0.5, target = "rules"),
  appearance = list(rhs = "eDNAConc=high")
)

# Subset ruleset to avoid redundancy and select significant rules
rules <- rules %>%
  subset(!is.redundant(., measure = "confidence")) %>%
  subset(is.significant(., alpha = 0.05)) %>%
  sort(by = c("confidence", "lift", "support"))

# Create a heatmap using confidence as the metric
rule_heatmap(
  rules,
  metric = "confidence",
  graph_title = "Confidence Heatmap"
)

# Create a heatmap using lift as the metric with custom colors
rule_heatmap(
  rules,
  metric = "lift",
  graph_title = "Lift Heatmap",
  low_color = "#D4A221",
  high_color = "darkgreen"
)

The BrookTrout dataset included in the RulesTools package provides environmental metadata to explore factors influencing high eDNA concentrations in aquatic samples. This dataset is derived from a study conducted in Hanlon Creek (Guelph, ON, Canada) in September 2019.

The dataset includes the following environmental and biological variables:

# Load the dataset
data(BrookTrout)

# View the first few rows
head(BrookTrout)

# Summary statistics
summary(BrookTrout)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4