cloudos R package makes it easy to interact with Lifebit’s CloudOS platform in an R environment.
You can install the latest release of cloudos from:
install.packages("cloudos")
conda install -c conda-forge r-cloudos
if (!require(remotes)) { install.packages("remotes") } remotes::install_github("lifebit-ai/cloudos")
Alternatively, you can install the latest development version of cloudos:
git clone https://github.com/lifebit-ai/cloudos cd cloudos git checkout origin/devel Rscript -e 'devtools::install(".")'
Below is a demonstration of how the cloudos package can be used.
library(cloudos) #> #> Welcome to Lifebit's CloudOS R client #> For Documentation visit - https://lifebit-ai.github.io/cloudos/ #> This package is under active development. If you found any issues, #> Please reach out here - https://github.com/lifebit-ai/cloudos/issues library(knitr) # For better visualization of wide dataframes in this README examples library(magrittr) # For pipe
This package is primarily a means of communicating with a CloudOS instance using it’s API. Before it can communicate with the CloudOS instance, the package must be configured with some key information: - The CloudOS base URL. This is the URL in your browser when you navigate to the Cohort Browser in CloudOS. Often of the form https://my_instance.lifebit.ai/app/cohort-browser
. - The CloudOS token. Navigate to settings page in CloudOS to generate an API key you can use as your token (see image below). - The CloudOS team ID. Also found in the settings page in CloudOS labelled as the “Workspace ID” (see image below).
The package will look for this information in the following locations in this order:
CLOUDOS_BASEURL
, CLOUDOS_TOKEN
, and CLOUDOS_TEAMID
.There are three ways to configure the package:
~/.Renviron
in the following way, which will load the environment variables on beginning of the R-sessionCLOUDOS_BASEURL="xxx" CLOUDOS_TOKEN="xxx" CLOUDOS_TEAMID="xxx"
Sys.setenv(ENV_VAR = "env_var_value")
Sys.setenv(CLOUDOS_BASEURL = "xxx") Sys.setenv(CLOUDOS_TOKEN = "xxx") Sys.setenv(CLOUDOS_TEAMID = "xxx")
cloudos_configure()
, which will create a ~/.cloudos/config
that will persist between R sessions and be read from each time (Recommended way if you are using multiple cloudos clients).cloudos_configure(base_url = "xxx", token = "xxx", team_id = "xxx")Application - Cohort Browser
Below information is out of date, please refer to the latest function docs.
Cohort Browser is part of Lifebit’s CloudOS offering. Let’s explore how to interact with this in R environment.
To check list of available cohorts in a workspace.
cohorts <- cb_list_cohorts() #> Total number of cohorts found: 3. Showing 10 by default. Change 'size' parameter to return more. cohorts %>% head(n=5) %>% kable()id name description number_of_participants number_of_filters created_at updated_at 610d3004597aa12e251abdf2 cohort-hms This cohort is for testing purpose, created from R. 20778 0 2021-08-06T12:50:12.242Z 2021-08-06T13:25:00.192Z 610ac00edb7c7a1d9d0c309f il_test01 NA 415 2 2021-08-04T16:27:58.708Z 2021-08-04T16:30:06.253Z 60feab0767a6666b8bf9e11b Manos Test NA 530 0 2021-07-26T12:31:03.458Z 2021-08-04T13:02:46.731Z
To create a new cohort.
my_cohort <- cb_create_cohort(cohort_name = "Cohort-R", cohort_desc = "This cohort is for testing purpose, created from R.") #> Cohort created successfully. my_cohort #> Cohort ID: 610d47d7597aa12e251abdf4 #> Cohort Name: Cohort-R #> Cohort Description: This cohort is for testing purpose, created from R. #> Number of phenotypes in query: 1 #> Cohort Browser version: v2
Get a available cohort in to a cohort R object. This cohort object can be used in many different other functions.
other_cohort <- cb_load_cohort(cohort_id = "610ac00edb7c7a1d9d0c309f") other_cohort #> Cohort ID: 610ac00edb7c7a1d9d0c309f #> Cohort Name: il_test01 #> Cohort Description: #> Number of phenotypes in query: 2 #> Cohort Browser version: v2Explore available phenotypes
Search for phenotypes based on a term. Searching with term = ""
will return all the available phenotypes.
disease_phenotypes <- cb_search_phenotypes(term = "disease") #> Total number of phenotypic filters found - 18 disease_phenotypes %>% head(n=5) %>% kable()id name description array type valueType units bucket500 bucket1000 bucket2500 bucket5000 bucket300 bucket10000 categoryPathLevel1 categoryPathLevel2 instances Sorting coding descriptionParticipantsNo link descriptionStability descriptionCategoryID descriptionItemType descriptionStrata descriptionSexed orderPhenotype instance0Name instance1Name instance2Name instance3Name instance4Name instance5Name instance6Name instance7Name instance8Name instance9Name instance10Name instance11Name instance12Name instance13Name instance14Name instance15Name instance16Name 28 Rare diseases family sk Database identifier for a rare disease family 1 text_search Text FALSE FALSE FALSE FALSE FALSE FALSE Basic characteristics NA 1 89132 https://cnfl.extge.co.uk/pages/viewpage.action?pageId=147659370 Main 100k Programme 29 Rare diseases family id A locally-allocated family identifier assigned to the proband and their relatives. This should be unique to this duo or trio within the GMC and is necessary for linking related participants. 1 text_search Text FALSE FALSE FALSE FALSE FALSE FALSE Basic characteristics NA 1 89132 https://cnfl.extge.co.uk/pages/viewpage.action?pageId=147659370 Main 100k Programme 177 Cancer disease sub type (HPO) The subtype of the cancer in question, recorded against a limited set of supplied enumerations. 4 bars Categorical multiple FALSE FALSE FALSE FALSE FALSE FALSE Cancer Participant disease 1 17404 https://cnfl.extge.co.uk/pages/viewpage.action?pageId=147659370 Main 100k Programme 178 Cancer disease type The cancer type of the tumour sample submitted to Genomics England. 4 bars Categorical multiple FALSE FALSE FALSE FALSE FALSE FALSE Cancer Participant disease 1 17404 https://cnfl.extge.co.uk/pages/viewpage.action?pageId=147659370 Main 100k Programme 206 Disease group Top-level classification of rare diseases (project specific) 5 bars Categorical multiple FALSE FALSE FALSE FALSE FALSE FALSE Rare disease Participant disease 1 39913 https://cnfl.extge.co.uk/pages/viewpage.action?pageId=147659370 Main 100k Programme
Let’s choose a phenotype from the above table. The “id” is the most important part as it will allow us to use this phenotype for cohort queries and other functions.
# get the first row/phenotype in the table my_phenotype <- disease_phenotypes[5,] my_phenotype %>% kable()id name description array type valueType units bucket500 bucket1000 bucket2500 bucket5000 bucket300 bucket10000 categoryPathLevel1 categoryPathLevel2 instances Sorting coding descriptionParticipantsNo link descriptionStability descriptionCategoryID descriptionItemType descriptionStrata descriptionSexed orderPhenotype instance0Name instance1Name instance2Name instance3Name instance4Name instance5Name instance6Name instance7Name instance8Name instance9Name instance10Name instance11Name instance12Name instance13Name instance14Name instance15Name instance16Name 206 Disease group Top-level classification of rare diseases (project specific) 5 bars Categorical multiple FALSE FALSE FALSE FALSE FALSE FALSE Rare disease Participant disease 1 39913 https://cnfl.extge.co.uk/pages/viewpage.action?pageId=147659370 Main 100k Programme Get distribution of cohort participants for a phenotype
Let’s check the numbers of participants across the categories of this phenotype.
# phenotype my_pheno_data <- cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = my_phenotype$id) my_pheno_data %>% head(n=10) %>% kable()_id number total Metabolic disorders 125 5090 Ultra-rare disorders 272 5090 dysmorphic and congenital abnormality syndromes 3 5090 Skeletal disorders 109 5090 Respiratory disorders 37 5090 Endocrine disorders 121 5090 Dermatological disorders 68 5090 Tumour syndromes 228 5090 tumour syndromes 3 5090 Psychiatric disorders 5 5090 Update a cohort with a new query
A query defines what particpants are included in a cohort based on phenotypes.
Phenotypes can be continuous - in which case a selected range needs to be specified, or they can be categorical - in which case selected categories need to be specified.
For phenotype “Year of birth” (with id = 8)
# cb_get_phenotype_metadata(8)$name # "Year of birth" cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 8) %>% head(n=10) %>% kable()_id number total 1923 3 44667 1924 9 44667 1925 8 44667 1926 4 44667 1927 16 44667 1928 36 44667 1929 47 44667 1930 60 44667 1931 81 44667 1932 105 44667
For phenotype “Total full brothers” (with id = 48).
# cb_get_phenotype_metadata(48)$name # "Total full brothers" cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 48) %>% kable()_id number total 0 4248 13276 1 7791 13276 2 1237 13276 Filtering cohorts using queries
Now let’s restrict our cohort to a set of participants based on the phenotypes we explored above.
A single phenotype query can be defined using the phenotype
function.
# total full brothers: 1 categorical_query <- phenotype(id = 48, value = 1) # year of birth: 1965 - 1995 continuous_query <- phenotype(id = 8, from = 1965, to = 1995)
To combine single phenotype queries, you can use &
, |
and !
operators.
query <- categorical_query & continuous_query cb_participant_count(cohort = my_cohort, query = query, keep_query = F) #> $total #> [1] 44667 #> #> $count #> [1] 2524
Any number of single phenotypes can be combined using any combination of operators. The order in which logic is resolved follows the usual rules and can be controlled using brackets.
categorical_query_2 <- phenotype(id = 48, value = 2) query <- (categorical_query | categorical_query_2) & continuous_query cb_participant_count(cohort = my_cohort, query = query, keep_query = F) #> $total #> [1] 44667 #> #> $count #> [1] 2883
If we’re happy that this is a sensible query to apply, we can apply the query to the cohort, making sure to override the previous query by setting keep_query
to FALSE
. If we wanted to keep the criteria from the pre-exisitng query and add our new phenotype-based criteria to them we would leave keep_query
set to the defualt value of TRUE
.
# apply the query cb_apply_query(cohort = my_cohort, query = query, keep_query = F) #> Query applied sucessfully. # update the local cohort object with info from the changed version on the server my_cohort <- cb_load_cohort(my_cohort@id) # double check that the cohort has th number of participants we expected cb_participant_count(cohort = my_cohort) #> $total #> [1] 44667 #> #> $count #> [1] 2883
We could now further restrict our cohort to include only females (phenotype “Participant phenotypic sex”, id = 10) by using keep_query = TRUE
. In other words, this argument applies a query that looks like “old query AND new query”.
new_query <- phenotype(id = 10, value = "Female") # apply the query cb_apply_query(cohort = my_cohort, query = new_query, keep_query = T) #> Query applied sucessfully. # update the local cohort object with info from the changed version on the server my_cohort <- cb_load_cohort(my_cohort@id) # check the number of participants cb_participant_count(my_cohort) #> $total #> [1] 44667 #> #> $count #> [1] 1457
Now that the query has been applied to our cohort, let's inspect the distribution of our phenotype of interest in the cohort.
# view the distribution of disease groups in our cohort cb_get_phenotype_statistics(cohort = my_cohort, pheno_id = 206) %>% head(n=10) %>% kable()_id number total Hearing and ear disorders 33 1731 Growth disorders 14 1731 Endocrine disorders 34 1731 Dermatological disorders 31 1731 Respiratory disorders 8 1731 dysmorphic and congenital abnormality syndromes 2 1731 Skeletal disorders 37 1731 Ophthalmological disorders 148 1731 neurology and neurodevelopmental disorders 4 1731 Tumour syndromes 76 1731 Retreive the participant table
Now lets get a participant phenotype table with the columns of interest for our cohort.
First we have to update the cohort on the cohort browser server to set what columns will be in the table. Currently the best way to do this is to use (counterintuitively) cb_apply_query
to add the IDs of the phenotypes of interest as columns.
cb_apply_query(my_cohort, column_ids = c(208, 10, 8, 48), keep_columns = T) #> Query applied sucessfully. my_cohort <- cb_load_cohort(my_cohort@id)
Now we can fetch the participant phenotype table which includes these columns.
pheno_df <- cb_get_participants_table(cohort = my_cohort, page_size = cb_participant_count(my_cohort)$count) pheno_df %>% head(n=10) %>% kable()EID Programme Handling gmc Year of birth Participant ethnic category Participant karyotypic sex Participant type Specific disease Participant phenotypic sex Total full brothers 1000020 Rare Diseases North Thames 1970 Not Stated Unknown Relative NA Female NA 1000397 Rare Diseases North Thames 1999 Mixed: White and Asian Not Supplied Proband NA Female 2 1000411 Cancer Yorkshire and Humber 1966 White: British NA NA NA Female NA 1000673 Rare Diseases Genomics Network Alliance 2012 Asian or Asian British: Pakistani Not Supplied Proband Osteogenesis imperfecta Female 1 1001010 Rare Diseases West Midlands 1970 White: British Not Supplied Proband NA Female 0 1001033 Rare Diseases North East and Cumbria 1986 Not Stated Not Supplied Proband Intellectual disability Female NA 1001429 Rare Diseases West Midlands 1981 White: British Not Supplied Relative NA Female NA 1001667 Rare Diseases Genomics Network Alliance 1986 White: British Not Supplied Relative NA Female NA 1001712 Rare Diseases Genomics Network Alliance 1983 White: British Not Supplied Relative NA Female NA 1001749 Cancer West London 1965 Not Known NA NA NA Female NA
Get the genotypic table for a cohort (currently only cohort browser version 1 is supported).
cohort_genotype <- cb_get_genotypic_table(cohort = my_cohort) cohort_genotype %>% head(n=2) %>% kable()
This package is under active development. If you find any issues, please reach out here - https://github.com/lifebit-ai/cloudos/issues
For documentation visit - https://lifebit-ai.github.io/cloudos/
MIT © Lifebit
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4