SCOPRO (SCOre PROjection) is an R package that assigns a score projection from 0 to 1 between a given in vivo stage and each single cluster from an in vitro dataset. The score is assigned based on the the fraction of specific markers of the in vivo stage that are conserved in the in vitro clusters.
InstallationYou can install the development version from GitHub with:
devtools::install_github("ScialdoneLab/SCOPRO",auth_token="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",ref="master")
Getting started
The main function of the package is SCOPRO
SCOPROSCOPRO(norm_vitro, norm_vivo, cluster_vitro, cluster_vivo, name_vivo, marker_stages_filter, threshold = 0.1, number_link = 1, fold_change = 3, threshold_fold_change = 0.1, marker_stages, selected_stages)
requires as input:
The mean expression profile of marker_stages_filter genes is computed for each cluster in the in vivo and in vitro dataset. For a given cluster, a connectivity matrix is computed with number of rows and number of columns equal to the length of marker_stages_filter. Each entry (i,j) in the matrix can be 1 if the fold_change between gene i and gene j is above fold_change. Otherwise is 0. Finally the connectivity matrix of a given name_vivo stage and all the clusters in the in vitro dataset are compared. A gene i is considered to be conserved between name_vivo and an in vitro cluster if the jaccard index of the links of gene i is above threshold.
ExampleBelow an example of input using the development version of SCOPRO from GitHub
current_wd <- getwd()
url = "https://hmgubox2.helmholtz-muenchen.de/index.php/s/EHQSnjMJxkR7QYT/download/SCOPRO.zip"
destfile <- paste0(current_wd,"/SCOPRO.zip")
download.file(url, destfile, quiet = FALSE)
unzip(destfile, exdir=current_wd)
Load in vitro dataset (single cell RNA seq mouse data from Iturbe et al., 2021)
setwd(paste0(current_wd,"/SCOPRO"))
load(file='mayra_dati_raw_0.Rda')
mayra_seurat_0=cluster_analysis_integrate_rare(mayra_dati_raw_0,"Mayra_data_0",0.1,5,30)
norm_es_vitro=as.matrix(GetAssayData(mayra_seurat_0, slot = "data",assay="RNA"))
cluster_es_vitro=as.vector(mayra_seurat_0$RNA_snn_res.0.1)
Load in vivo mouse dataset (bulk RNA seq data from Deng et al. , 2014 and Mohammed et al. , 2017)
setwd(paste0(current_wd,"/SCOPRO"))
load(file="seurat_genes_published_mouse.Rda")
norm_vivo <- as.matrix(GetAssayData(seurat_genes_published_mouse, slot = "data",assay="RNA"))
Compute markers for selected in vivo stages using CIARA function markers_cluster_seurat based on package Seurat
DefaultAssay(seurat_genes_published_mouse) <- "RNA"
cluster_mouse_published <- as.vector(seurat_genes_published_mouse$stim)
relevant_stages <- c("Late_2_cell", "epiblast_4.5", "epiblast_5.5", "epiblast_6.5")
DefaultAssay(seurat_genes_published_mouse) <- "RNA"
markers_first_ESC_small <- CIARA::markers_cluster_seurat(seurat_genes_published_mouse[,cluster_mouse_published%in%relevant_stages],cluster_mouse_published[cluster_mouse_published%in%relevant_stages],names(seurat_genes_published_mouse$RNA_snn_res.0.2)[cluster_mouse_published%in%relevant_stages],10)
markers_mouse <- as.vector(markers_first_ESC_small[[3]])
stages_markers <- names(markers_first_ESC_small[[3]])
## Keeping only the genes in common between in vitro and in vivo datasets
stages_markers <- stages_markers[markers_mouse %in% row.names(norm_es_vitro)]
markers_small <- markers_mouse[markers_mouse %in% row.names(norm_es_vitro)]
names(markers_small) <- stages_markers
For each in vivo stage, we select only the markers for which the median is above 0.1 and is below 0.1 in all the other stages.
marker_result <- select_top_markers(relevant_stages, cluster_mouse_published, norm_vivo, markers_small, max_number = 100, threshold = 0.1)
marker_all <- marker_result[[1]]
marker_stages <- marker_result[[2]]
Run SCOPRO
We run SCOPRO between the cluster of the mouse ESCs dataset and the in vivo stage âLate 2-cellsâ.
The function SCOPRO first computes the mean expression profile of marker_stages_filter genes for each cluster in the in vivo and in vitro dataset. For a given cluster, a connectivity matrix is computed with number of rows and number of columns equal to the length of marker_stages_filter. Each entry (i,j) in the matrix can be 1 if the fold_change between gene i and gene j is above fold_change. Otherwise is 0. Finally the connectivity matrix of Late 2-cells stage and all the clusters in the in vitro dataset are compared. A gene i is considered to be conserved between Late 2-cells stage and an in vitro cluster if the jaccard index of the links of gene i is above threshold.
There are 25 markers of the Late 2-cells stage that are also expressed in the mouse ESC datasets. More than 75% of these 25 markers are conserved in the cluster number 2. This result is expected since cluster 2 is made up by 2CLC, a rare population of cells known to be transcriptionally similar to the late 2 cells-stage in the mouse embryo development (typical markers of 2CLC are the Zscan4 genes, also highly expressed in the late 2 cells-stage).
marker_stages_filter <- filter_in_vitro(norm_es_vitro,cluster_es_vitro ,marker_all, fraction = 0.10, threshold = 0)
analysis_2cell <- SCOPRO(norm_es_vitro,norm_vivo,cluster_es_vitro,cluster_mouse_published,"Late_2_cell",marker_stages_filter, threshold = 0.1, number_link = 1, fold_change = 3, threshold_fold_change = 0.1 ,marker_stages, relevant_stages)
plot_score(analysis_2cell, marker_stages, marker_stages_filter, relevant_stages, "Late_2_cell", "Final score", "Cluster", "Late_2_cell")
Visualization of conserved/ not conserved genes between late 2 cells stage and in vitro clusters
We can visualize which are the markers of the late 2 cells stage that are conserved/ not conserved in cluster 2. As expected the Zscan4 family genes are conserved.
common_genes <- select_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
no_common_genes <- select_no_common_genes(analysis_2cell, marker_stages, relevant_stages, "Late_2_cell", cluster_es_vitro, "2")
all_genes <- c(no_common_genes[1:4], common_genes[1:10])
all_genes_label <- c(paste0(no_common_genes[1:4], "-no_conserved"), paste0(common_genes[1:10], "-conserved"))
rabbit_plot <- plot_score_genes(all_genes, "Mouse ESC", "Mouse vitro", norm_es_vitro,norm_vivo[ , cluster_mouse_published=="Late_2_cell"],cluster_es_vitro, cluster_mouse_published[cluster_mouse_published == "Late_2_cell"], all_genes_label, 7, 10, "Late_2_cell")
rabbit_plot
Vignette
The following vignette is available and completely reproducible. In this vignette it is shown the projection performed between single cell RNA seq mouse data from Iturbe et al., 2021 and in vivo mouse datasets from Deng et al. , 2014 and Mohammed et al. , 2017. It can be accessed within R with:
utils::vignette("SCOPRO_vignette")
Contributions and Support
Contributions in the form of feedback, comments, code and bug report are welcome. * For any contributions, feel free to fork the source code and submit a pull requests. * Please report any issues or bugs here: https://github.com/ScialdoneLab/SCOPRO/issues. Any questions and requests for support can also be directed to the package maintainer (gabriele[dot]lubatti[at]helmholtz-muenchen[dot]de).
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4