ViScore (vee-score) is a toolkit for evaluating and benchmarking dimensionality reduction.
It is published together with ViVAE, a tool for single-cell data denoising and dimensionality reduction. Check out the associated paper: Interpretable models for scRNA-seq data embedding with multi-scale structure preservation, where we describe and validate our methods in-depth.
To try out ViScore without installing it locally, follow the tutorial on scRNA-seq data dimensionality reduction in the ViVAE repository, which gives instructions on usage within Google Colab.
ViScore is a Python package. We recommend creating a new Anaconda environment for ViScore, or using the one you may have already created for ViVAE.
On Linux or macOS, use the command line for installation. On Windows, use Anaconda Prompt.
Stand-alone installationconda create --name ViScore --channel conda-forge python=3.11.7 \ numpy==1.26.3 numba==0.59.0 matplotlib==3.8.2 scipy==1.12.0 pynndescent==0.5.11 scikit-learn==1.4.0 pyemd==1.0.0 conda activate ViScore pip install --upgrade git+https://github.com/saeyslab/ViScore.gitShared environment with ViVAE
conda activate ViVAE pip install pyemd==1.0.0 pip install --upgrade git+https://github.com/saeyslab/ViScore.git
Examples of ViScore usage are shown in tutorials in the ViVAE repository.
viscore.score
quantifies Local and Global SP without the use of labels (higher is better).viscore.xnpe
quantifies local distortion of labelled populations (lower is better).viscore.neighbourhood_composition_plot
shows sources of error in local embeddings of labelled populations.Each of these functions is documented: for example, use help(viscore.score)
to find out more about Local and Global SP scoring.
ViScore enables unsupervised assessment of structure preservation in LD embeddings of HD data using scores based on RNX curves. This is an objective approach based on quantifying neighbourhood preservation between HD and LD for all neighbourhood scales.
RNX curves show (scaled) overlap between neighbour ranks for all neighbourhoods of size from 1 to N-1.
Taking the AUC (Area-Under-Curve) with logarithmic scale for K (neighbourhood size), we effectively up-weight the significance of local neighbourhoods, without setting a hard cut-off for what is still considered local. This is the Local SP score (SL).
Taking the AUC with linear scale for K, we dispense with the locality bias and assume equal importance for all neighbourhood scales. This is the Global SP score (SG).
Both of these values are bounded by -1 and 1 (higher is better), where 0 corresponds to SP by a random embedding.
Since the computation of an RNX curve has quadratic complexity, this approach is impractical or impossible to apply to larger datasets. We circumvent this by approximating the RNX curve using a repeated vantage point tree-based sampling approach. This is implemented in viscore.score
.
You can find our documented benchmarking set-up for comparing DR methods on scRNA-seq data in the benchmarking
folder in this repository. It takes you through the process of setting up and deploying a benchmarking (or hyperparameter tuning) workflow on a high-performance computing (HPC) cluster. The framework is extensible in terms of DR methods and datasets.
Additionally, code to generate figures and LaTeX tables for presenting results of your benchmark is included.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4