RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://vsn-pipelines-examples.readthedocs.io/en/latest/PBMC10k.html below:

10k PBMCs (10x Genomics) — VSN-Pipelines-examples documentation

VSN-Pipelines-examples 10k PBMCs (10x Genomics)Â¶

This tutorial shows the steps to analyze a typical scRNA-seq dataset with a single sample. We will use PBMC data available from the 10x Genomics support website. The same dataset was used in the DSL1 version of this pipeline, described in the SCENIC protocol tutorial (here).

Prepare 10x input dataÂ¶

The âFeature / cell matrix (filtered)â data was downloaded from 10x Genomics, here.

wget http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_filtered_feature_bc_matrix.tar.gz

When using 10x data as an input, the pipeline assumes the files are in the typical Cell Ranger directory structure. This is not the case when downloading the processed counts from the 10x website, so we will put them into the proper format:

mkdir -p pbmc10k/outs/
tar xvf pbmc_10k_v3_filtered_feature_bc_matrix.tar.gz -C pbmc10k/outs/

which results in:

$ tree pbmc10k
pbmc10k
âââ outs
    âââ filtered_feature_bc_matrix
        âââ barcodes.tsv.gz
        âââ features.tsv.gz
        âââ matrix.mtx.gz

2 directories, 3 files

So, in the nextflow config file, generated in the following step, the tenx input channel should point to the outs folder. For example:

params.data.tenx.cellranger_mex = '/home/cflerin/analysis/pbmc10k/dsl2_0.19.0/pbmc10k/outs'

Setup the VSN-pipelines projectÂ¶ Update the repositoryÂ¶

Pull/update the vsn-pipelines repository cached by nextflow. Here, we use the -r flag to specify the pipeline version to use:

nextflow pull vib-singlecell-nf/vsn-pipelines

Build the config fileÂ¶

We use a combination of profiles to build the config file:

tenx: defines the input data type
single_sample_scenic: loads the basic parameters to run the single_sample and scenic workflows
scenic_use_cistarget_motifs and scenic_use_cistarget_tracks: includes parameters to specify the location of the cistarget database files
hg38: specifies the genome. Other options are: hg19, dm6, mm10.
singularity (or docker): specifies container system to use to run the processes

nextflow config vib-singlecell-nf/vsn-pipelines \
    -profile tenx,single_sample_scenic,scenic_use_cistarget_motifs,scenic_use_cistarget_tracks,hg38,singularity \
    > pbmc10k.vsn-pipelines.complete.config

Important variables to check in the config:

singularity.runOptions (or docker.runOptions): making sure the correct volume mounts are specified (requires the user home folder (included by default in Singularity), and the location of the data).
params.global.project_name (optional): will control the naming of the output files.
params.sc.scope.tree.level_${X} (optional): controls the labeling of the loom file when uploaded to the SCope viewer.
params.sc.scanpy.filter: filtering settings for the Scanpy steps.
params.sc.scanpy.feature_selection: controls how highly variable genes are selected.
params.sc.scanpy.clustering: controls cluster settings. In the example here, we select two clustering resolutions by using resolutions = [0.4,0.8].

Specifying compute resource usage in the config:

The global executor (process.executor) is set to local by default. It can be changed to qsub, etc. to run specific processes as jobs. The executor parameter can be added to specific labels to run only these processes as jobs. Typically the GRN step should be submitted as a job (compute_resources__scenic_grn).
The number of cpus and memory usage can be adjusted for each label.

The complete config file used here is available at: pbmc10k/pbmc10k.vsn-pipelines.complete.config.

Run the VSN-pipelines projectÂ¶ First passÂ¶

While the overall goal is to run the âbest practices stepsâ and SCENIC together, we can first skip running SCENIC, and focus on getting the filtering and preprocessing steps correct. Then, we can move on to run the resource-intensive SCENIC steps. Even though we created a profile with single_sample and scenic options together, we can run just the single_sample workflow first:

nextflow -C pbmc10k.vsn-pipelines.complete.config \
    run vib-singlecell-nf/vsn-pipelines \
    -entry single_sample

Now, the QC reports can be inspected (see out/notebooks/intermediate/pbmc10k.SC_QC_filtering_report.html, either the original ipynb, or the converted html file). The cell and gene filters can be updated by editing the config file. For example, the relevant filters used here are:

params {
    sc {
        scanpy {
            filter = {
                cellFilterMinNGenes = 200
                cellFilterMaxNGenes = 4000
                cellFilterMaxPercentMito = 0.15
                geneFilterMinNCells = 3
            }
        }
    }
}

Re-run the pipeline as many times as needed (with resume to skip alread-completed steps) to select the proper filters:

nextflow -C pbmc10k.vsn-pipelines.complete.config \
    run vib-singlecell-nf/vsn-pipelines \
    -entry single_sample

Second passÂ¶

Once the cell and gene filters look ok, we can re-start the pipeline with the full SCENIC steps enabled. This will re-run any steps in which the parameters have changed (e.g. the filtering and downstream steps), while skipping the initial conversion, etc. when the -resume option is used:

nextflow -C pbmc10k.vsn-pipelines.complete.config \
    run vib-singlecell-nf/vsn-pipelines \
    -entry single_sample_scenic \
    -resume

ResultsÂ¶

Once the pipeline is complete (approximately 2 hours on a HPC system using 15 processes for the SCENIC GRN step), the output will be the following files (display truncated):

$ tree out
out/
âââ data
âÂ Â  âââ intermediate
â   â   âââ [...]
âÂ Â  âââ pbmc10k.PBMC10k_DSL2.single_sample.output.h5ad
âââ loom
âÂ Â  âââ pbmc10k.SCENIC_SCope_output.loom
âÂ Â  âââ pbmc10k.SCope_output.loom
âââ nextflow_reports
âÂ Â  âââ execution_report.html
âÂ Â  âââ execution_timeline.html
âÂ Â  âââ execution_trace.txt
âÂ Â  âââ pipeline_dag.dot
âââ notebooks
âÂ Â  âââ intermediate
âÂ Â  âââ pbmc10k.merged_report.html
âÂ Â  âââ pbmc10k.merged_report.ipynb
âÂ Â  âââ pbmc10k.merged_report.louvain_0.4.html
âÂ Â  âââ pbmc10k.merged_report.louvain_0.4.ipynb
âÂ Â  âââ pbmc10k.merged_report.louvain_0.8.html
âÂ Â  âââ pbmc10k.merged_report.louvain_0.8.ipynb
âââ scenic
    âââ pbmc10k
        âââ arboreto_with_multiprocessing
        âÂ Â  âââ pbmc10k__adj.tsv
        âÂ Â  âââ pbmc10k.filtered.loom
        âââ aucell
        âÂ Â  âââ pbmc10k__auc_mtf.loom
        âÂ Â  âââ pbmc10k__auc_trk.loom
        âÂ Â  âââ pbmc10k.filtered.loom
        âââ cistarget
        âÂ Â  âââ pbmc10k.filtered.loom
        âÂ Â  âââ pbmc10k__reg_mtf.csv
        âÂ Â  âââ pbmc10k__reg_trk.csv
        âââ notebooks
        âÂ Â  âââ SCENIC_report.html
        âÂ Â  âââ SCENIC_report.ipynb
        âââ SCENIC_output.loom
        âââ SCENIC_SCope_output.loom

The final SCENIC output is packaged into a loom file, which includes the results of the parallel expression analysis (based on highly variable genes). This can be found at out/loom/pbmc10k.SCENIC_SCope_output.loom, and is ready to be uploaded to a SCope session. The output loom file from this analysis can be found on the SCENIC protocol SCope session.

Also included is out/data/pbmc10k.PBMC10k_DSL2.single_sample.output.h5ad, an anndata file generated by the Scanpy section of the pipeline, including the results of the expression analysis (but not results from SCENIC).

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4