A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/na396/SGCP below:

na396/SGCP: SGCP: a spectral self-learning method for clustering genes in co-expression networks

SGCP: a spectral self-learning method for clustering genes in co-expression networks, link

The Self-training Gene Clustering Pipeline (SGCP) is an innovative framework for constructing and analyzing gene co-expression networks. Its primary objective is to group genes with similar expression patterns into cohesive clusters, often referred to as modules. SGCP introduces several novel steps that enable the computation of highly enriched gene modules in an unsupervised manner. What sets SGCP apart from existing frameworks is its integration of a semi-supervised clustering approach, which leverages Gene Ontology (GO) information. This unique step significantly enhances the quality of the resulting modules, producing highly enriched and biologically relevant clusters.

SGCP is available at BMC Bioinformatics.

For detailed instructions and steps, please refer to the SGCP manual on Bioconductor page. To install the latest version of SGCP, you can access the GitHub repository using the following command:

#install.packages("devtools")
#devtools::install_github("na396/SGCP")

GPL-3

UTF-8

SGCP requires three main inputs; expData , geneID, and annotation_db.

Below are some commonly used annotation_db packages along with their corresponding gene identifiers for different organisms.

organism annotation_db gene identifier Homo sapiens (Hs) org.Hs.eg.db Entrez Gene identifiers Drosophila melanogaster (Dm) org.Dm.eg.db Entrez Gene identifiers Rattus norvegicus (Rn) org.Rn.eg.db Entrez Gene identifiers Mus musculus (Mm) org.Mm.eg.db Entrez Gene identifiers Arabidopsis thaliana (At) org.At.tair.db TAIR identifiers

Gene expression datasets for your analysis can be obtained from the Gene Expression Omnibus, a public repository of high-throughput gene expression data.

In SGCP, the following assumptions are made about the input genes:

Here, we give a brief example of the SGCP input. For this documentation, we use the gene expression GSE181225. For more information visit its Gene Expression Omnibus page).

Throughout this section, several Bioconductor packages will be required. Make sure to install and load them as needed to follow the example.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(c("org.Hs.eg.db", "GEOquery", "AnnotationDbi"))

First, set the directory

# Display the current working directory
print(getwd())

# If necessary, change the path below to the directory where the data files are stored.
# "." means current directory. On Windows use a forward slash / instead of the usual \.
workingDir = "."
setwd(workingDir)

First, we need to download the gene expression file. The R package GEOquery is used to obtain gene expression data from the Gene Expression Omnibus. For detailed information on how to use GEOquery, refer to the GEOquery guide.

To download the expression file for GSE181225, visit its Gene Expression Omnibus page. On the page, locate the file GSE181225_LNCaP_p57_VO_and_p57_PIM1_RNA_Seq_normalizedCounts.txt.gz in the Supplementary files section, which contains the normalized gene expression data. Download this supplementary file and save it to the directory specified by baseDir.


library(GEOquery)

gse = getGEOSuppFiles("GSE181225", baseDir = getwd())

After downloading the file, you should find a new directory named GSE181225, which contains the gene expression file. To proceed, read the gene expression file into R. The file has the following structure:

df = read.delim("GSE181225/GSE181225_LNCaP_p57_VO_and_p57_PIM1_RNA_Seq_normalizedCounts.txt.gz")
head(df)

Next, create the expData, geneID, and annotation_db.

geneID = df[,1]

expData = df[, 2:ncol(df)]
rownames(expData) = geneID

library(org.Hs.eg.db)

To map gene symbols to Entrez identifiers using the annotation_db, you can use the select function from the AnnotationDbi package. Here’s how you can do it in R:

library(AnnotationDbi)

genes = AnnotationDbi::select(org.Hs.eg.db, keys = rownames(expData), 
                      columns=c("ENTREZID"), 
                      keytype="SYMBOL")
# initial dimension
print(dim(genes))
head(genes)

Remove genes with missing SYMBOL or ENTREZID.

genes = genes[!is.na(genes$SYMBOL), ]
genes = genes[!is.na(genes$ENTREZID), ]

#dimension after dropping missing values
print(dim(genes))
head(genes)

Remove genes with duplicated SYMBOL or ENTREZID.

genes = genes[!duplicated(genes$SYMBOL),]
genes = genes[!duplicated(genes$ENTREZID), ]
#dimension after dropping missing values
print(dim(genes))
print(head(genes))

Keep only rows in expData that have corresponding gene identifiers present in genes.

expData = data.frame(expData, SYMBOL = rownames(expData))
expData =  merge(expData, genes, by = "SYMBOL")

Produce expData.

rownames(expData) = expData$ENTREZID
expData = expData[, c(2:6)]
print(head(expData))

Remove genes with zero variance from expData.

# Dropping zero variance genes

vars = apply(expData, 1, var)
zeroInd = which(vars == 0)

if(length(zeroInd) != 0) {
  print(paste0("number of zero variance genes ", length(zeroInd)))
  expData = expData[-zeroInd, ]
  genes = genes[-zeroInd, ]
}

print(paste0("number of genes after dropping ", dim(genes)[1]))

Remove genes with no gene ontology mapping.

## Remove genes with no GO mapping

xx = as.list(org.Hs.egGO[genes$ENTREZID])
haveGO  = sapply(xx,
                 function(x) {if (length(x) == 1 && is.na(x)) FALSE else TRUE })
numNoGO  = sum(!haveGO)
if(numNoGO != 0){
  print(paste0("number of genes with no GO mapping ", length(zeroInd)))
  expData = expData[haveGO, ]
  genes = genes[haveGO, ]
  
}
print(paste0("number of genes after dropping ", dim(genes)[1]))

Produce the final expData, geneID, annotation_db. Now, the input is ready for SGCP. Refer to SGCP Bioconductor page in order to see how to use this input in SGCP.

expData = expData
print(head(expData))

geneID = genes$ENTREZID
print(head(geneID))

annotation_db = "org.Hs.eg.db" 

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4