Note
This repository is in active development
The DicePlot package allows you to create visualizations (dice plots) for datasets with more than two categorical variables and additional continuous variables. This tool is particularly useful for exploring complex categorical data and their relationships with continuous variables.
To install the DicePlot package, follow these steps:
Ensure that you have R installed on your system. You can download it from The Comprehensive R Archive Network (CRAN). Or use conda:
conda create -n diceplot -c conda-forge r-base -y conda activate diceplot2. Install Required Packages
The DicePlot
package depends on several other R packages. Install them by running:
install.packages(c( "devtools", "dplyr", "ggplot2", "tidyr", "data.table", "ggdendro" ))
You have three options for installing the DicePlot package:
3.1 Install from CRAN (Recommended)install.packages("diceplot")
# Install devtools if you haven't already install.packages("devtools") # Install DicePlot from GitHub devtools::install_github("maflot/DicePlot/diceplot")3.3 Install from Local Files
Download the repository and run the following code to install the package:
install.packages("$path on your local machine$/DicePlot/diceplot", repos = NULL, type="source")
After installation, load the DicePlot
package into your R session:
Here is a real-world example using data from Huang et al. (2021) showing gene expression patterns across different immune cell types and demographic groups.
For more examples, check the example/ folder.
# Load necessary libraries library(readxl) library(dplyr) library(tidyr) library(stringr) library(writexl) library(RColorBrewer) library(UpSetR) library(ggplot2) library(diceplot) # Set your file path file_path <- "data/pnas.2023216118.sd05.xlsx" # Function to create the properly formatted CSV as the data comes in a rather hard to use format for this task process_excel_to_csv <- function(file_path) { # Read Excel file with detailed options to ensure proper data reading raw_data <- read_excel(file_path, col_names = FALSE, na = "", trim_ws = TRUE) # Extract cell types from row 2 cell_types_row <- raw_data[2,] # Extract demographic info from row 3 demo_row <- raw_data[3,] # Create a list to store all transformed data all_data <- list() # Define cell type mapping cell_type_map <- c( "NK" = "Natural Killer (NK) cell", "TC" = "T cell (TC)", "BC" = "B cell (BC)", "DC" = "Dendritic cell (DC)", "MC" = "Monocyte (MC)" ) # Special handling for the staggered format # First find the cell type columns cell_type_columns <- c() for (i in 1:ncol(raw_data)) { if (!is.na(cell_types_row[[i]]) && cell_types_row[[i]] != "") { cell_type_columns <- c(cell_type_columns, i) } } # Print debug information print(paste("Found cell type columns:", paste(cell_type_columns, collapse = ", "))) # Process each cell type column and its associated demographic columns for (col_idx in cell_type_columns) { cell_type <- cell_types_row[[col_idx]] cell_type_full <- cell_type_map[cell_type] # Look at the next 4 columns (OM, OF, YM, YF) for (offset in 0:3) { demo_col <- col_idx + offset # Check if this column exists and has a valid demographic if (demo_col <= ncol(raw_data) && !is.na(demo_row[[demo_col]]) && demo_row[[demo_col]] != "") { demo_info <- demo_row[[demo_col]] # Print debug info print(paste("Processing column", demo_col, "- Cell type:", cell_type, "- Demo info:", demo_info)) # Extract demographic information age <- case_when( substr(demo_info, 4, 4) == "O" ~ "old", substr(demo_info, 4, 4) == "Y" ~ "young", TRUE ~ NA_character_ ) sex <- case_when( substr(demo_info, 5, 5) == "M" ~ "male", substr(demo_info, 5, 5) == "F" ~ "female", TRUE ~ NA_character_ ) # Print the parsed demographic info for debugging print(paste(" Demographic parsed as:", age, sex)) # Process each gene in this column gene_count <- 0 for (row_idx in 4:nrow(raw_data)) { gene <- raw_data[row_idx, demo_col][[1]] # Skip empty genes if (is.na(gene) || gene == "") { next } gene_count <- gene_count + 1 # Create a row for this gene gene_row <- data.frame( id = paste0(cell_type, "_", demo_info, "_", gene), gene = gene, cell_type_code = cell_type, cell_type = cell_type_full, age_code = substr(demo_info, 4, 4), age = age, sex_code = substr(demo_info, 5, 5), sex = sex, demo_code = demo_info ) # Add to our list all_data[[length(all_data) + 1]] <- gene_row } print(paste(" Processed", gene_count, "genes in this column")) } } } # Process the data processed_data <- process_excel_to_csv(file_path) # Create a demographic combination column that combines age and sex processed_data <- processed_data %>% mutate(demo_combination = case_when( age == "old" & sex == "male" ~ "Old Male", age == "old" & sex == "female" ~ "Old Female", age == "young" & sex == "male" ~ "Young Male", age == "young" & sex == "female" ~ "Young Female", TRUE ~ paste(age, sex) )) # Order the demographic combinations factor processed_data$demo_combination <- factor( processed_data$demo_combination, levels = c("Old Male", "Old Female", "Young Male", "Young Female") ) # Order cell types processed_data$cell_type <- factor( processed_data$cell_type, levels = c( "Natural Killer (NK) cell", "T cell (TC)", "B cell (BC)", "Dendritic cell (DC)", "Monocyte (MC)" ) ) # Create summary table with gene counts gene_counts <- processed_data %>% group_by(gene, cell_type, demo_combination) %>% summarize(tmp_count = n(), .groups = "drop") # Define colors for demographic combinations demo_colors <- c( "Old Male" = "#E41A1C", # Red "Old Female" = "#377EB8", # Blue "Young Male" = "#4DAF4A", # Green "Young Female" = "#984EA3" # Purple ) # Get top 25 most frequent genes top_25_genes <- processed_data %>% count(gene) %>% arrange(desc(n)) %>% head(25) %>% pull(gene) # Filter gene_counts to include only top 25 genes filtered_gene_counts <- gene_counts %>% filter(gene %in% top_25_genes) # Add default group column filtered_gene_counts$default = "" # Create the diceplot p_dice_filtered <- dice_plot( data = filtered_gene_counts, x = "gene", # x-axis: genes y = "cell_type", # y-axis: cell types z = "demo_combination", # z parameter: demographic combinations cluster_by_column = T, cluster_by_row = F, title = "Gene Expression across Cell Types and Demographics\n(Top 25 Genes)", z_colors = demo_colors, # Use the proper color palette max_dot_size = 6, min_dot_size = 3, legend_width = 0.2, legend_height = 0.25, show_legend = T ) # Display the diceplot print(p_dice_filtered)
Here is a simple artificial example of how to use the DicePlot v0.1.2
package.
For more examples, check the tests/ folder.
# Load necessary libraries library(diceplot) library(tidyr) library(data.table) library(ggplot2) library(dplyr) library(tibble) library(grid) library(cowplot) library(RColorBrewer)
First, we define the cell types, pathways, pathway groups, pathology variables, and assign colors to pathology variables:
# Define common variables cell_types <- c("Neuron", "Astrocyte", "Microglia", "Oligodendrocyte", "Endothelial") pathways <- c( "Apoptosis", "Inflammation", "Metabolism", "Signal Transduction", "Synaptic Transmission", "Cell Cycle", "DNA Repair", "Protein Synthesis", "Lipid Metabolism", "Neurotransmitter Release", "Oxidative Stress", "Energy Production", "Calcium Signaling", "Synaptic Plasticity", "Immune Response" ) # Assign groups to pathways pathway_groups <- data.frame( Pathway = pathways, Group = c( "Linked", "UnLinked", "Other", "Linked", "UnLinked", "UnLinked", "Other", "Other", "Other", "Linked", "Other", "Other", "Linked", "UnLinked", "Other" ), stringsAsFactors = FALSE ) pathology_variables <- c("AD", "Cancer", "Flu", "ADHD", "Age", "Weight") # Assign colors to pathology variables n_colors <- length(pathology_variables) colors <- brewer.pal(n = n_colors, name = "Set1") z_colors <- setNames(colors, pathology_variables)
Explanation:
Now we finalize the data and plot the dice plot:
# Create dummy data set.seed(123) data <- expand.grid(CellType = cell_types, Pathway = pathways, stringsAsFactors = FALSE) data <- data %>% rowwise() %>% mutate( PathologyVariable = list(sample(pathology_variables, size = sample(1:length(pathology_variables), 1))) ) %>% unnest(cols = c(PathologyVariable)) # Merge the group assignments into the data data <- data %>% left_join(pathway_groups, by = "Pathway") # Use the dice_plot function with new parameter names p = dice_plot( data = data, x = "CellType", y = "Pathway", z = "PathologyVariable", group = "Group", group_alpha = 0.6, title = "Dice Plot with 6 Pathology Variables", z_colors = z_colors, custom_theme = theme_minimal(), min_dot_size = 2, max_dot_size = 4 ) print(p) # Simply save the plot using the ggplot functions # ggsave("./diceplot_example.png", p, width = 8, height = 9)
Explanation:
A Domino Plot is a specialized visualization from the DicePlot package that allows you to display differential expression data across multiple categorical variables. It's particularly useful for visualizing how gene expression changes across different cell types, conditions, and contrasts.
The plot uses colors to represent up/down-regulation and size to represent statistical significance. This example uses data from the ZEBRA database, a hierarchically integrated gene expression atlas of the murine and human brain at single-cell resolution.
Before starting, ensure you have the following packages installed:
install.packages(c("dplyr", "tidyr", "ggplot2", "diceplot"))
For this tutorial, we'll use a dataset derived from human cortex samples that contains differential expression analysis results comparing gene expression between sexes across various neurological conditions. The dataset includes:
library(dplyr) library(tidyr) library(ggplot2) library(diceplot)Step 2: Load and Prepare the Data
# Load dataset zebra.df = read.csv(file = "data/ZEBRA_sex_degs_set.csv") genes = c("SPP1","APOE","SERPINA1","PINK1","ANGPT1","ANGPT2","APP","CLU","ABCA7") zebra.df <- zebra.df %>% filter(gene %in% genes) %>% filter(contrast %in% c("MS-CT","AD-CT","ASD-CT","FTD-CT","HD-CT")) %>% mutate(cell_type = factor(cell_type, levels = sort(unique(cell_type)))) %>% filter(PValue < 0.05)Step 3: Create a Basic Domino Plot
Let's start with a basic domino plot using the default parameters:
p_basic <- domino_plot( data = zebra.df, # Input data gene_list = genes, # List of genes to include var_id = "contrast", # Variable that identifies different conditions x = "gene", # Variable for x-axis y = "cell_type", # Variable for y-axis contrast = "sex", # Contrast variable (e.g., male vs female) log_fc = "logFC", # Column name for log fold change p_val = "FDR" # Column name for p-values ) # Display the plot print(p_basic)Step 4: Create a Customized Domino Plot
Now, let's create a more customized version with specific dot sizes and logFC limits:
p_advanced <- domino_plot( data = zebra.df, gene_list = genes, var_id = "contrast", x = "gene", y = "cell_type", contrast = "sex", log_fc = "logFC", p_val = "FDR", min_dot_size = 1, # Minimum dot size for least significant results max_dot_size = 3, # Maximum dot size for most significant results logfc_limits = c(min(zebra.df$logFC)-1, max(zebra.df$logFC)-1) # Custom logFC color scale limits ) # Display the plot print(p_advanced$domino_plot)Understanding the Domino Plot Output
The domino_plot()
function returns a list with several components:
domino_plot
: The main plot objectYou can access the main plot using p_advanced$domino_plot
.
Since the domino plot returns a ggplot2 object, you can further customize it using standard ggplot2 functions:
p_custom <- p_advanced$domino_plot + theme_minimal() + theme( axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5, size = 14), legend.position = "bottom" ) + labs(title = "Differential Expression Across Cell Types and Conditions") # Display the customized plot print(p_custom) # Save the plot ggsave("domino_plot_example.png", p_custom, width = 10, height = 8, dpi = 300)Step 6: Creating a Faceted Domino Plot
You can create a faceted domino plot to separate results by a particular variable:
p_faceted <- domino_plot( data = zebra.df, gene_list = genes, var_id = "contrast", x = "gene", y = "cell_type", contrast = "sex", log_fc = "logFC", p_val = "FDR", min_dot_size = 1, max_dot_size = 3 )$domino_plot + facet_wrap(~contrast, scales = "free_y") + theme( strip.background = element_rect(fill = "lightgray"), strip.text = element_text(face = "bold") ) # Display the faceted plot print(p_faceted) # Save the faceted plot ggsave("domino_plot_faceted.png", p_faceted, width = 14, height = 10, dpi = 300)
The domino_plot()
function has several important parameters:
data
Input data frame containing all necessary variables gene_list
List of genes to include in the plot var_id
Variable that identifies different conditions x
Variable for x-axis (typically genes) y
Variable for y-axis (typically cell types) contrast
Contrast variable (e.g., sex, treatment) log_fc
Column name for log fold change values p_val
Column name for p-values min_dot_size
Minimum dot size for least significant results max_dot_size
Maximum dot size for most significant results logfc_limits
Custom limits for logFC color scale
Interpreting the Domino Plot
In a domino plot:
Color: Represents the direction and magnitude of change
Size: Represents statistical significance
Position: Shows the combination of categorical variables
This tutorial has prerquisties which are not defaults in the diceplot package itself. Before proceeding, install the required R packages:
install.packages(c("sf", "ggplot2", "diceplot", "dplyr", "cowplot", "rnaturalearth"))
We use a dataset containing city locations in Saarland, along with their log-transformed distances to France, Switzerland, Luxembourg, and Rheinland-Pfalz.
library(sf) library(ggplot2) library(diceplot) library(dplyr) library(cowplot) library(rnaturalearth)Step 2: Load and Prepare the Data
# Define custom dice face positions var_positions <- data.frame( x_offset = c(-0.3, 0.3, -0.3, 0.3), y_offset = c(0.3, 0.3, -0.3, -0.3), var = c("log_France", "log_Swiss", "log_Luxembourg", "log_Rheinlandpfalz") ) # Load Germany state boundaries germany_states <- ne_states(country = "Germany", returnclass = "sf") saarland <- germany_states[germany_states$name == "Saarland", ] # Define city locations and distances cities <- data.frame( name = c("Saarbrücken", "Saarlouis", "Homburg", "Britten", "Merzig", "Lebach", "Ottweiler"), dice = 4, lon = c(6.996, 6.751, 7.339, 6.784, 6.639, 6.913, 7.167), lat = c(49.234, 49.315, 49.320, 49.481, 49.442, 49.407, 49.400), France = c(14, 12, 38, 27, 18, 27, 36), Swiss = c(190, 204, 195, 221, 220, 210, 206), Luxembourg = c(51, 31, 67, 23, 17, 35, 52), Rheinlandpfalz = c(29, 27, 6, 16, 20, 12, 16) ) # Convert to spatial object cities_sf <- st_as_sf(cities, coords = c("lon", "lat"), crs = 4326) cities_sf$log_France <- log(cities_sf$France) cities_sf$log_Swiss <- log(cities_sf$Swiss) cities_sf$log_Luxembourg <- log(cities_sf$Luxembourg) cities_sf$log_Rheinlandpfalz <- log(cities_sf$Rheinlandpfalz)Step 3: Create a Custom Legend Function
create_custom_legends_for_map <- function(var_positions, dot_size, legend_text_size = 9) { legend_data <- var_positions %>% mutate(x = x_offset + 1, y = y_offset + 1) ggplot() + geom_point(data = legend_data, aes(x = x, y = y), size = dot_size, color = "black") + geom_point(data = legend_data, aes(x = x, y = y), size = dot_size + 0.5, shape = 1, color = "black") + coord_fixed(ratio = 1, xlim = c(0.5, 2.5), ylim = c(0.5, 1.5), expand = FALSE) + geom_text( data = legend_data, aes( x = ifelse(x > 0, x + 0.15, x - 0.15), y = ifelse(y > 0, y + 0.15, y - 0.15), label = var, hjust = ifelse(x < 0, 1, 0), vjust = ifelse(y > 0, 0, 1) ), size = legend_text_size / 3, color = "black" ) + ggtitle("Dice arrangement") + theme_void() }Step 4: Create a map with geom_dice_sf
geom_dice_sf is fully integratable with ggplot.
# Generate legend plot legend_plot <- create_custom_legends_for_map(var_positions, dot_size = 3) # Generate main dice plot main_plot <- ggplot() + geom_sf(data = saarland, fill = "lightblue", color = "black") + geom_dice_sf( sf_data = cities_sf, dice_value_col = "dice", face_color = c("log_France", "log_Swiss", "log_Luxembourg", "log_Rheinlandpfalz"), dice_size = 0.5, dot_size = 3 ) + geom_text( data = cities_sf, mapping = aes(x = st_coordinates(cities_sf)[,1], y = st_coordinates(cities_sf)[,2], label = name), nudge_y = 0.03, size = 3 ) + ggtitle("Saarland with Dice Markers Showing Log-Scaled Distances to Borders") + theme_minimal() # Combine main plot and legend final_plot <- plot_grid(main_plot, legend_plot, ncol = 2, rel_widths = c(4, 1)) # Display the final plot final_plot
For using dice plots in Python, please refer to pyDicePlot
For full documentation and additional examples, please refer to the documentation
ggplot2
for advanced plotting capabilities.We welcome contributions from the community! If you'd like to contribute:
If you have any questions, suggestions, or issues, please open an issue on GitHub.
If you use this code or the R and Python packages for your own work, please cite diceplot as:
M. Flotho, P. Flotho, A. Keller, "Diceplot: A package for high dimensional categorical data visualization," arxiv, 2024. doi:10.48550/arXiv.2410.23897
BibTeX entry:
@article{flotea2024,
author = {Flotho, M. and Flotho, P. and Keller, A.},
title = {Diceplot: A package for high dimensional categorical data visualization},
year = {2024},
journal = {arXiv preprint},
doi = {https://doi.org/10.48550/arXiv.2410.23897}
}
[1] Flotho, M., Flotho, P., Keller, A. (2024). Diceplot: A package for high dimensional categorical data visualization. arXiv preprint. https://doi.org/10.48550/arXiv.2410.23897
[2] Flotho, M., Amand, J., Hirsch, P., Grandke, F., Wyss-Coray, T., Keller, A., Kern, F. (2023). ZEBRA: a hierarchically integrated gene expression atlas of the murine and human brain at single-cell resolution. Nucleic Acids Research, 52(D1), D1089-D1096. https://doi.org/10.1093/nar/gkad990
[3] Huang, Z., Chen, B., Liu, X., Li, H., Xie, L., Gao, Y., Duan, R., Li, Z., Zhang, J., Zheng, Y., et al. (2021). Effects of sex and aging on the immune cell landscape as assessed by single-cell transcriptomic analysis. Proceedings of the National Academy of Sciences, 118(33), e2023216118. https://doi.org/10.1073/pnas.2023216118
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4