Provides tools for detecting XOR-like patterns in variable pairs. Includes visualizations for pattern exploration.
OverviewTraditional feature selection methods often miss complex non-linear relationships where variables interact to produce class differences. The detectXOR
package specifically targets XOR patterns - relationships where class discrimination only emerges through variable interactions, not individual variables alone.
ð XOR pattern detection - Statistical identification using ϲ and Wilcoxon tests
ð Correlation analysis - Class-wise Kendall Ï coefficients
ð Visualization - Spaghetti plots and decision boundary visualizations
â¡ Parallel processing - Multi-core acceleration for large datasets
ð¬ Robust statistics - Winsorization and scaling options for outlier handling
Install the development version from GitHub:
# Install devtools if needed
if (!requireNamespace("devtools", quietly = TRUE)) { install.packages("devtools") }
# Install detectXOR
devtools::install_github("JornLotsch/detectXOR")
Dependencies
The package requires R ⥠3.5.0 and depends on: - dplyr
, tibble
(data manipulation) - ggplot2
, ggh4x
, scales
(visualization) - future
, future.apply
, pbmcapply
, parallel
(parallel processing) - reshape2
, glue
(data processing and string manipulation) - DescTools
(statistical tools) - Base R packages: stats
, utils
, methods
, grDevices
Optional packages (suggested): - testthat
, knitr
, rmarkdown
(development and documentation) - doParallel
, foreach
(additional parallel processing options)
library(detectXOR)
# Load example data
data(XOR_data)
# Detect XOR patterns with default settings
results <- detectXOR(XOR_data, class_col = "class")
# View summary
print(results$results_df)
Usage with custom parameters
# Detection with custom thresholds and parallel processing
results <- detect_xor(
data = XOR_data,
class_col = "class",
p_threshold = 0.01,
tau_threshold = 0.4,
max_cores = 4,
extreme_handling = "winsorize",
scale_data = TRUE
)
Function parameters detectXOR()
- Main detection function data
data.frame required Input dataset with variables and class column class_col
character "class"
Name of the class/target variable column check_tau
logical TRUE
Compute class-wise Kendall Ï correlations compute_axes_parallel_significance
logical TRUE
Perform group-wise Wilcoxon tests p_threshold
numeric 0.05
Significance threshold for statistical tests tau_threshold
numeric 0.3
Minimum absolute Ï for âstrongâ correlation abs_diff_threshold
numeric 20
Minimum absolute difference for practical significance split_method
character "quantile"
Tile splitting method: "quantile"
or "range"
max_cores
integer NULL
Maximum cores for parallel processing (auto-detect if NULL) extreme_handling
character "winsorize"
Outlier handling: "winsorize"
, "remove"
, or "none"
winsor_limits
numeric vector c(0.05, 0.95)
Winsorization percentiles scale_data
logical TRUE
Standardize variables before analysis use_complete
logical TRUE
Use only complete cases (remove NA values) Output structure
The detectXOR()
function returns a list with two components: ### results_df
- Summary data frame
var1
, var2
Variable pair names xor_shape_detected
Logical: XOR pattern identified chi_sq_p_value
ϲ test p-value for tile independence tau_class_0
, tau_class_1
Class-wise Kendall Ï coefficients tau_difference
Absolute difference between class Ï values wilcox_p_x
, wilcox_p_y
Wilcoxon test p-values for each axis significant_wilcox
Logical: significant group differences detected pair_list
- Detailed results
Contains comprehensive analysis for each variable pair including: - Tile pattern analysis results - Statistical test outputs - Processed data subsets - Intermediate calculations
Visualization functionsgenerate_spaghetti_plot_from_results()
Creates connected line plots showing variable trajectories for XOR-detected pairs results
, data
, class_col
, scale_data = TRUE
generate_xy_plot_from_results()
Generates scatter plots with decision boundary lines for detected XOR patterns results
, data
, class_col
, scale_data = TRUE
, quantile_lines = c(1/3, 2/3)
, line_method = "quantile"
Both functions return ggplot objects that can be displayed or saved manually.
# Generate plots
generate_spaghetti_plot_from_results(results, XOR_data)
generate_xy_plot_from_results(results, XOR_data)
Example plots
Reporting functions generate_xor_reportConsole()
Creates console-friendly formatted report with optional plots results
, data
, class_col
, scale_data = TRUE
, show_plots = TRUE
generate_xor_reportHTML()
Generates comprehensive HTML report with interactive elements results
, data
, class_col
, output_file
, open_browser = TRUE
Example report
# Generate formatted report
generate_xor_reportHTML(results, XOR_data, class_col = "class")
The report will be automaticlaly opened in the system standard web browser.
Methodology XOR detection pipelinefuture::multisession
for parallel processingpbmcapply::pbmclapply
with fork-based parallelismdetectXOR/
âââ R/ # Package source code
âââ man/ # Package documentation
âââ data/ # Example dataset
âââ issues/ # Problem reporting
âââ analyses/ # Files used to generate or plot publictaion data sets (not in library)
Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests on GitHub. ## License GPL-3 ## Citation
For citation details or to request a formal publication reference, please contact the maintainer.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4