dbplyr is the database backend for dplyr. It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL.
To learn more about why you might use dbplyr instead of writing SQL, see vignette("sql")
. To learn more about the details of the SQL translation, see vignette("translation-verb")
and vignette("translation-function")
.
# The easiest way to get dbplyr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just dbplyr: install.packages("dbplyr") # Or the development version from GitHub: # install.packages("pak") pak::pak("tidyverse/dbplyr")
dbplyr is designed to work with database tables as if they were local data frames. To demonstrate this I’ll first create an in-memory SQLite database and copy over a dataset:
library(dplyr, warn.conflicts = FALSE) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars)
Note that you don’t actually need to load dbplyr with library(dbplyr)
; dplyr automatically loads it for you when it sees you working with a database. Database connections are coordinated by the DBI package. Learn more at https://dbi.r-dbi.org/
Now you can retrieve a table using tbl()
(see ?tbl_dbi
for more details). Printing it just retrieves the first few rows:
mtcars2 <- tbl(con, "mtcars") mtcars2 #> # Source: table<`mtcars`> [?? x 11] #> # Database: sqlite 3.45.0 [:memory:] #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 #> # ℹ more rows
All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data.
# lazily generates query summary <- mtcars2 %>% group_by(cyl) %>% summarise(mpg = mean(mpg, na.rm = TRUE)) %>% arrange(desc(mpg)) # see query summary %>% show_query() #> <SQL> #> SELECT `cyl`, AVG(`mpg`) AS `mpg` #> FROM `mtcars` #> GROUP BY `cyl` #> ORDER BY `mpg` DESC # execute query and retrieve results summary %>% collect() #> # A tibble: 3 × 2 #> cyl mpg #> <dbl> <dbl> #> 1 4 26.7 #> 2 6 19.7 #> 3 8 15.1
Please note that the dbplyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4