# Installation options (Choose one. CRAN method is recommended for most users.)
## 1. Most recent release version via CRAN
install.packages("magmaR")
## 2. Development version via GitHub
remotes::install_github("mountetna/monoetna", subdir = "etna/packages/magmaR")
# Check installation and load the package
library(magmaR)
# Set up your authorization token and where to find magma
magma <- magmaRset()
## Note: run as above, you will be prompted in the console to provide your token.
## This token can be obtained from Janus.
# Now, you're ready to retrieve some data!
retrieve(
target = magma,
projectName = "example",
modelName = "subject"
)
Introduction magmaR, magma, and the UCSF Data Library
This vignette focuses on how to explore, query, and retrieve data from magma via its R-client, magmaR.
Magma is the data warehouse of the UCSF Data Library.
The Data Library holds various research data sets, broadly broken up into âprojectsâ, and provides tools for adding to, organizing, viewing and analyzing these data sets.
Internally, the system is composed of a set of applications that each provides a different piece of the Data Library pie. Through the Magma application, one can query and retrieve data from, or update data within, âprojectsâ that exist in the library.
We provide some more detail below, but for an even deeper overview of the structure of magma and the Data Library system than is provided here, you can refer to the main source of documentation, https://mountetna.github.io/magma.html.
Organization of data within magmaData types within magma projects are organized into models, and individual data then make up the records of those models.
For example, information & data for 3 tubes run on a flow cytometer might make up 3 individual records of a flow cytometry model
Each record might have multiple attributes, such as the âgene_countsâ matrix, the âcell_numberâ, or sorted cell âfractionâ attributes of records that are part of an ârna_seqâ model.
The set of attributes which a record might possess, are defined separately for each model. Thus, records of a âflowâ model might have an âfcs_fileâ attribute, but records of the ârna_seqâ model likely would not.
Hierarchically, the root of a project is always the âprojectâ model, and every other model must have a single parent
model. Thus, the data graph is like a tree.
(Technically, link-type attributes may be used to indicate additional one-to-one or one-to-many relationships between models other than the tree-like parent <- model <- âchildrenâ relationships, which allows the graph to be more like a directed acyclic graph (DAG) than a tree⦠but imagining projects as trees is certainly easier than as an abstract blob.)
Here is a sketch of what an example project might look like. Quite literally though, it is in fact the layout of the âexampleâ project which we will be playing with later on in this vignette:
example_project_map
This âexampleâ project has 6 different models, including the project model itself. Each model holds different chunks of information (attributes) about data in the âexampleâ project. For example, the subject
model contains information about individuals (records) for whom biospecimens exist; the biospecimen
model would then contain information about specific specimens obtained from each subject; and the flow
and rna_seq
models contain data and information from individual flow cytometry or rna_seq assays that were run on an individual biospecimen.
Each project in the data library system might have its own distinct modeling layout â as where to split up the information sharing scheme is highly dependent on a projectâs data collection and experimental plans. However, in general, one can think of records of a model at the bottom of the tree as inheriting attributes from the parent records of their parent models. So for example, although we can also think of each model as itâs own independent set of data, ârna_seqâ-model records are ultimately linked to individual âsubjectâ-model records. Thus, even though attributes of the âsubjectâ-model are not directly included in the ârna_seqâ-model, all âsubjectâ-model attributes of âsubjectâ-model records do apply to linked ârna_seqâ-model records. In magmaR, we include a function retrieveMetadata() for retrieving such linked data. More on that later.
At this point, you should know enough about the structure of magma projects to start using magmaR. But more information exists within magmaâs own documentation: https://mountetna.github.io/magma.html.
How magmaR functions workIn general, magmaR functions will:
The goal of magmaR is to allow users as direct as possible, yet also as ready-to-analyze as possible, access to data that exists within magma. Thus, some minor restructuring is performed by magmaR functions which does not change the underlying data, but does reorganize that data into more efficient formats for downstream analysis within R.
The two main output structures of magma returnsThere are two main output structures for returns from magma:
Both formats are received as character strings, but then:
The data.frame format tends to be easier to work with, but both of these can be fit quite readily into downstream applications.
InstallationmagmaR will be submitted to CRAN soon. Once accepted, built, and hosted by CRAN, users will be able to install the package with justâ¦
install.packages("magmaR")
Alternatively/currently, development versions of magmaR can be installed via the GitHub with:
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("mountetna/monoetna", subdir = "etna/packages/magmaR")
After either of the above, one can check proper installation with:
library(magmaR)
Authorization process, a.k.a. janus token utilization:
In order to access data in magma, a user needs to be authorized to do so. How this is achieved is via provision of a user-specific, temporary, string which we call a token. This token can be obtained from https://janus.ucsf.edu/.
Providing a tokenWithin magmaR, the token is provided as part of the target
input which can be constructed with the magmaRset()
function.
To this function, a userâs token can be provided in one of two ways.
1) Via an interactive prompt (Recommended when coding interactively):
When not provided explicitly, as is the other method, the user will be prompted to provide their token via the interactive console. It is recommended that you store the output of your call to magmaRset()
as a variable, and then provide this variable within each subsequent call to a magmaR function, as below.
# Method1: User will be prompted to give their token in the R console
prod <- magmaRset()
ids_subject <- retrieveProjects(
# Now, we give the output of magmaRset() to the 'target' input of any
# other magmaR function.
target = prod)
If you run the above code, you should be prompted to
Enter your Janus TOKEN (without quotes):
To fill this in, navigate to Janus via your favorite browser, click the Copy Token
button, then paste the value into your console.
2) Give your token explicitly
Users can alternatively fill their token in by providing it explicitly to the token
input of magmaRset()
. This is not the generally recommended method because it is not ideal to have authorization values saved within potentially share-able locations. However the tokens are short-lived and methods of mitigating risk of such token exposure exist, see below.
prod <- magmaRset(token = "<your-token-here>")
ids_subject <- retrieveProjects(
# Now, we give the output of magmaRset() to the 'target' input of any
# other magmaR function.
target = prod)
NOTE: Instead of adding your token directly to any file which you might save, it is recommended that you utilize your .Renviron
file to store your token. To do so, you can:
usethis::edit_r_environ()
function to open your .Renviron
file. (Install usethis
with install.packages("usethis")
first.)TOKEN="<your_token>"
.magmaRset(token = Sys.getenv("TOKEN"))
, but when you save your script or .Rmd, the token itself will not be included.If a request to magma returns that âYou are unauthorizedâ, magmaR will provide extra info so that users can fix this issue:
# Error message when magma sends back that user is unauthorized:
You are unauthorized. If you think this is a mistake, re-run `?magmaRset` to update your 'token' input, then retry.
Controlling which version of magma to target
For advanced users with access to the staging or development versions of magma, switching can be achieved by adding url = "production/staging/development-url"
when setting up your target
with magmaRset()
.
dev <- magmaRset(url = "http://magma.development.local")
# When calling to magma...
ids_subject <- retrieveIds(
# Now give this to 'target':
target = dev,
# ^^
projectName = "example",
modelName = "subject",
url.base = "http://magma.development.local")
Helper functions
These functions allow exploration of what data exists within a given project.
Although it is possible to rely on timur.ucsf.edu/<projectName>/map, or on Timurâs search functionality, in order to determine options for projectName
, modelName
, recordNames
or attributeName(s)
inputs, magmaR provides these helper functions to allow users to achieve these goals without leaving R.
# projectName options:
retrieveProjects(
target = prod)
## project_name project_name_full role privileged resource
## 1 example Example project administrator FALSE TRUE
## requires_agreement
## 1 FALSE
# modelName options:
retrieveModels(
target = prod,
projectName = "example")
## [1] "population" "sc_seq_dataset" "rna_seq" "project"
## [5] "biospecimen" "flow" "demographic" "subject"
# recordNames options:
retrieveIds(
target = prod,
projectName = "example",
modelName = "subject")
## [1] "EXAMPLE-HS1" "EXAMPLE-HS10" "EXAMPLE-HS11" "EXAMPLE-HS12" "EXAMPLE-HS2"
## [6] "EXAMPLE-HS3" "EXAMPLE-HS4" "EXAMPLE-HS5" "EXAMPLE-HS6" "EXAMPLE-HS7"
## [11] "EXAMPLE-HS8" "EXAMPLE-HS9"
# attributeName(s) options:
retrieveAttributes(
target = prod,
projectName = "example",
modelName = "subject")
## [1] "name" "group" "project" "biospecimen"
For more complex needs like a complicated query()
request, you might require accessing the projectâs template itself. That can be achieved via the retrieveTemplate()
function:
# To retrieve the project template:
temp <- retrieveTemplate(
target = prod,
projectName = "example")
To explore the return, I recommend starting with the str()
function looking only a few levels in. You should see something like this:
str(temp, max.level = 3)
## List of 1
## $ models:List of 8
## ..$ population :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
## ..$ sc_seq_dataset:List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
## ..$ rna_seq :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
## ..$ project :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 3
## ..$ biospecimen :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
## ..$ flow :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
## ..$ demographic :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
## ..$ subject :List of 2
## .. ..$ documents: Named list()
## .. ..$ template :List of 4
Then, followup by looking into the $template
of individual models further, perhaps as below:
# For the "subject" model:
str(temp$models$subject$template)
## List of 4
## $ name : chr "subject"
## $ attributes:List of 7
## ..$ created_at :List of 8
## .. ..$ name : chr "created_at"
## .. ..$ attribute_name: chr "created_at"
## .. ..$ display_name : chr "Created At"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi TRUE
## .. ..$ validation : NULL
## .. ..$ attribute_type: chr "date_time"
## ..$ updated_at :List of 8
## .. ..$ name : chr "updated_at"
## .. ..$ attribute_name: chr "updated_at"
## .. ..$ display_name : chr "Updated At"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi TRUE
## .. ..$ validation : NULL
## .. ..$ attribute_type: chr "date_time"
## ..$ name :List of 8
## .. ..$ name : chr "name"
## .. ..$ attribute_name: chr "name"
## .. ..$ display_name : chr "Name"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi FALSE
## .. ..$ validation : NULL
## .. ..$ attribute_type: chr "identifier"
## ..$ group :List of 8
## .. ..$ name : chr "group"
## .. ..$ attribute_name: chr "group"
## .. ..$ display_name : chr "Group"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi FALSE
## .. ..$ validation : NULL
## .. ..$ attribute_type: chr "string"
## ..$ project :List of 12
## .. ..$ name : chr "project"
## .. ..$ attribute_name : chr "project"
## .. ..$ model_name : chr "project"
## .. ..$ link_model_name : chr "project"
## .. ..$ link_attribute_name: chr "subject"
## .. ..$ link_attribute_type: chr "collection"
## .. ..$ display_name : chr "Project"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi FALSE
## .. ..$ validation : NULL
## .. ..$ attribute_type : chr "parent"
## ..$ biospecimen:List of 12
## .. ..$ name : chr "biospecimen"
## .. ..$ attribute_name : chr "biospecimen"
## .. ..$ model_name : chr "biospecimen"
## .. ..$ link_model_name : chr "biospecimen"
## .. ..$ link_attribute_name: chr "subject"
## .. ..$ link_attribute_type: chr "parent"
## .. ..$ display_name : chr "Biospecimen"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi FALSE
## .. ..$ validation : NULL
## .. ..$ attribute_type : chr "collection"
## ..$ demographic:List of 12
## .. ..$ name : chr "demographic"
## .. ..$ attribute_name : chr "demographic"
## .. ..$ model_name : chr "demographic"
## .. ..$ link_model_name : chr "demographic"
## .. ..$ link_attribute_name: chr "subject"
## .. ..$ link_attribute_type: chr "parent"
## .. ..$ display_name : chr "Demographic"
## .. ..$ restricted : logi FALSE
## .. ..$ read_only : logi FALSE
## .. ..$ hidden : logi FALSE
## .. ..$ validation : NULL
## .. ..$ attribute_type : chr "table"
## $ identifier: chr "name"
## $ parent : chr "project"
Main data download functions:
Finally, the meat of why weâre here.
magma has two main data output functions, /retrieve
and /query
.
magmaR provides methods for both.
retrieve() & retrieveJSON()retrieve()
is probably the main workhorse function of magmaR
. If your goal is to download âsubjectâ data for a specific patient of a project, or for all patients of the project, this is the function to start with.
The basic structure is to provide which project, projectName
and which model, modelName
, that you want data for.
df <- retrieve(
target = prod,
projectName = "example",
modelName = "subject")
head(df)
## name group project biospecimen
## 1 EXAMPLE-HS1 g1 Example project EXAMPLE-HS1-WB1
## 2 EXAMPLE-HS10 g3 Example project EXAMPLE-HS10-WB1
## 3 EXAMPLE-HS11 g3 Example project EXAMPLE-HS11-WB1
## 4 EXAMPLE-HS12 g3 Example project EXAMPLE-HS12-WB1
## 5 EXAMPLE-HS2 g1 Example project EXAMPLE-HS2-WB1
## 6 EXAMPLE-HS3 g1 Example project EXAMPLE-HS3-WB1
Optionally, a set of recordNames
or attributeNames
can be given as well to grab a more specific subset of data from the given project-model pair.
df <- retrieve(
target = prod,
projectName = "example",
modelName = "subject",
recordNames = c("EXAMPLE-HS1", "EXAMPLE-HS2"),
attributeNames = "group")
head(df)
## name group
## 1 EXAMPLE-HS1 g1
## 2 EXAMPLE-HS2 g1
(You can use the retrieveIDs()
and retrieveAttributes()
functions described above in the Helper functions section to determine options for the recordNames
and attributeNames
inputs, respectively.)
Unfortunately, for certain attribute data types, matrix
and table
, the literal data are not actually given via magma/retrieve when format = "tsv"
. Instead only a pointer is returned. For such attributes, the retrieveJSON()
function can retrieve such data (via a magma/retrieve call with format = "json"
) and a wrapper that makes efficient use of retrieveJSON()
specifically for matrix data retrieval is also included. Users should not typically need to make use of retrieveJSON()
directly, as when the desired data is a matrix, retrieveMatrix()
is recommended instead. More details on that function follow.
json <- retrieveJSON(
target = prod,
projectName = "example",
modelName = "rna_seq",
recordNames = c("EXAMPLE-HS1-WB1-RSQ1", "EXAMPLE-HS2-WB1-RSQ1"),
attributeNames = "gene_counts")
retrieveMatrix()
Because matrices are a very common and important data structure, but are not accessible via retrieve()
, we provide this function. For a single matrix-type attribute, it will obtain data from magma in the required json structure, and then automatically reorganize said data into the matrix structure that a user would typically expect.
In the example below, we obtain the transcripts-per-million(-reads) normalized counts data for all records/samples of the example project. In this matrix, columns will be the individual records, and rows will be features. Specifically, for the example data here, those row names are âgene1â, âgene2â, and so on, but for real rna_seq data, those row names would typically be the Ensembl gene ids that each row of the matrix represents.
mat <- retrieveMatrix(
target = prod,
projectName = "example",
modelName = "rna_seq",
recordNames = "all",
attributeNames = "gene_tpm")
head(mat, n = c(6,3))
## EXAMPLE-HS10-WB1-RSQ1 EXAMPLE-HS11-WB1-RSQ1 EXAMPLE-HS12-WB1-RSQ1
## gene1 0.5187 7.9960 4.8278
## gene2 29.9572 42.9785 31.2540
## gene3 111.6587 154.9225 114.0897
## gene4 269.3555 426.7866 302.6299
## gene5 0.3891 1.9990 0.0000
## gene6 0.0000 0.0000 0.0000
Most user need not worry about the internal method, but for those that are curious: Under the hood, data is grabbed via retrieveJSON()
for 10 records at a time. The relevant data are then extracted from the complex list output of this retrieval route, then they are converted into a matrix structure where column names are the recordNames
. Row names are then grabbed from the modelâs template for what this data should represent.
The Magma Query API lets you pull data out of Magma through an expressive query interface. Often, if you want a specific set of data from model-X, but only, say, for records where linked records of model-Y have data for attribute-Z, then this is the endpoint you want.
But note: the format of query()
calls can be a bit complicated, so it is recommended to check if retreiveMetadata()
might better serve your purposes first. Weâll describe that function a bit later.
For guidance on how to format query()
calls, see ?query
and https://mountetna.github.io/magma.html#query.
query_out <- query(
target = prod,
projectName = "example",
queryTerms =
list('rna_seq',
'::all',
'biospecimen',
'::identifier')
)
Details: The default output of this function is a list conversion of the direct json output returned by magma/query. This list will contain either 2 or 3 parts:
names(query_out)
## [1] "answer" "type" "format"
answer, type (optional), and format.
Alternatively, the output can be reformatted as a dataframe if format = "df"
is given.
subject_ids_of_rnaseq_records <- query(
target = prod,
projectName = "example",
queryTerms =
list('rna_seq',
'::all',
'biospecimen',
'::identifier'),
format = "df"
)
head(subject_ids_of_rnaseq_records)
## example::rna_seq#tube_name example::biospecimen#name
## 1 EXAMPLE-HS10-WB1-RSQ1 EXAMPLE-HS10-WB1
## 2 EXAMPLE-HS11-WB1-RSQ1 EXAMPLE-HS11-WB1
## 3 EXAMPLE-HS12-WB1-RSQ1 EXAMPLE-HS12-WB1
## 4 EXAMPLE-HS1-WB1-RSQ1 EXAMPLE-HS1-WB1
## 5 EXAMPLE-HS2-WB1-RSQ1 EXAMPLE-HS2-WB1
## 6 EXAMPLE-HS3-WB1-RSQ1 EXAMPLE-HS3-WB1
Details: When format = "df"
is added, the list output will be converted to a data.frame where data comes from the answer
and column names come from the format
pieces.
In our example code for retrieveMatrix()
and retrieveMetadata()
, we obtained RNAseq data from the example project, and the linked metadata from the subject-model level. Now that we have these metadata for our rna_seq records, we could use them to start exploring our rna_seq data:
library(dittoSeq)
# Explore RNAseq data with dittoSeq
sce <- importDittoBulk(
list(tpm = mat), # mat was obtained with retrieveMatrix()
metadata = meta # meta was obtained with retrieveMetadata()
)
dittoBoxPlot(sce, "gene1", group.by = "group")
Session information
sessionInfo()
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Pop!_OS 22.04 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0 LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dittoSeq_1.20.0 ggplot2_3.5.2 magmaR_1.0.4 vcr_1.7.0
## [5] BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.38.1 gtable_0.3.6
## [3] xfun_0.52 bslib_0.9.0
## [5] httr2_1.1.2 ggrepel_0.9.6
## [7] Biobase_2.68.0 lattice_0.22-7
## [9] vctrs_0.6.5 tools_4.5.0
## [11] generics_0.1.3 stats4_4.5.0
## [13] curl_6.2.2 tibble_3.2.1
## [15] pkgconfig_2.0.3 pheatmap_1.0.12
## [17] Matrix_1.7-3 webmockr_2.0.0
## [19] RColorBrewer_1.1-3 ggridges_0.5.6
## [21] S4Vectors_0.46.0 lifecycle_1.0.4
## [23] GenomeInfoDbData_1.2.14 compiler_4.5.0
## [25] farver_2.1.2 tinytex_0.57
## [27] GenomeInfoDb_1.44.0 htmltools_0.5.8.1
## [29] sass_0.4.10 yaml_2.3.10
## [31] pillar_1.10.2 crayon_1.5.3
## [33] jquerylib_0.1.4 whisker_0.4.1
## [35] SingleCellExperiment_1.30.0 cachem_1.1.0
## [37] DelayedArray_0.34.1 abind_1.4-8
## [39] digest_0.6.37 bookdown_0.43
## [41] labeling_0.4.3 cowplot_1.1.3
## [43] fastmap_1.2.0 grid_4.5.0
## [45] colorspace_2.1-1 SparseArray_1.8.0
## [47] cli_3.6.5 magrittr_2.0.3
## [49] S4Arrays_1.8.0 base64enc_0.1-3
## [51] triebeard_0.4.1 crul_1.5.0
## [53] withr_3.0.2 scales_1.4.0
## [55] UCSC.utils_1.4.0 rappdirs_0.3.3
## [57] rmarkdown_2.29 XVector_0.48.0
## [59] httr_1.4.7 matrixStats_1.5.0
## [61] gridExtra_2.3 evaluate_1.0.3
## [63] knitr_1.50 GenomicRanges_1.60.0
## [65] IRanges_2.42.0 rlang_1.1.6
## [67] urltools_1.7.3 Rcpp_1.0.14
## [69] glue_1.8.0 httpcode_0.3.0
## [71] BiocManager_1.30.25 fauxpas_0.5.2
## [73] BiocGenerics_0.54.0 rstudioapi_0.17.1
## [75] jsonlite_2.0.0 R6_2.6.1
## [77] MatrixGenerics_1.20.0
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4