By default, split_*_by(varname, ...)
generates a facet for each level the variable varname
takes in the data - including unobserved ones in the factor
case. This behavior can be customized in various ways.
The most straightforward way to customize which facets are generated by a split is with one of the split functions or split function families provided by rtables
.
These predefined split functions and function factories implement commonly desired customization patterns of splitting behavior (i.e., faceting behavior). They include:
remove_split_levels
- remove specified levels from the data for facet generation.keep_split_levels
- keep only specified levels in the data for facet generation (removing all others).drop_split_levels
- drop levels that are unobserved within the data being split, i.e., associated with the parent facet.reorder_split_levels
- reorder the levels (and thus the generated facets) to the specified order.trim_levels_in_group
- drop unobserved levels of another variable independently within the data associated with each facet generated by the current split.add_overall_level
, add_combo_levels
- add additional âvirtualâ levels which combine two or more levels of the variable being split. See the following section.trim_levels_to_map
- trim the levels of multiple variables to a pre-specified set of value combinations. See the following section.The first four of these are fairly self-describing and for brevity, we refer our readers to ?split_funcs
for details including working examples.
Often with nested splitting involving multiple variables, the values of the variables in question are logically nested; meaning that certain values of the inner variable are only coherent in combination with a specific value or values of the outer variable.
As an example, suppose we have a variable vehicle_class
, which can take the values "automobile"
, and "boat"
, and a variable vehicle_type
, which can take the values "car"
, "truck"
, "suv"
,"sailboat"
, and "cruiseliner"
. The combination ("automobile"
, "cruiseliner"
) does not make sense and will never occur in any (correctly cleaned) data set; nor does the combination ("boat"
, "truck"
).
We will showcase strategies to deal with this in the next sections using the following artificial data:
set.seed(0)
levs_type <- c("car", "truck", "suv", "sailboat", "cruiseliner")
vclass <- sample(c("auto", "boat"), 1000, replace = TRUE)
auto_inds <- which(vclass == "auto")
vtype <- rep(NA_character_, 1000)
vtype[auto_inds] <- sample(
c("car", "truck"), ## suv missing on purpose
length(auto_inds),
replace = TRUE
)
vtype[-auto_inds] <- sample(
c("sailboat", "cruiseliner"),
1000 - length(auto_inds),
replace = TRUE
)
vehic_data <- data.frame(
vehicle_class = factor(vclass),
vehicle_type = factor(vtype, levels = levs_type),
color = sample(
c("white", "black", "red"), 1000,
prob = c(1, 2, 1),
replace = TRUE
),
cost = ifelse(
vclass == "boat",
rnorm(1000, 100000, sd = 5000),
rnorm(1000, 40000, sd = 5000)
)
)
head(vehic_data)
#> vehicle_class vehicle_type color cost
#> 1 boat sailboat black 100393.81
#> 2 auto car white 38150.17
#> 3 boat sailboat white 98696.13
#> 4 auto truck white 37677.16
#> 5 auto truck black 38489.27
#> 6 boat cruiseliner black 108709.72
trim_levels_in_group
The trim_levels_in_group
split function factory creates split functions which deal with this issue empirically; any combination which is observed in the data being tabulated will appear as nested facets within the table, while those that do not, will not.
If we use default level-based faceting, we get several logically incoherent cells within our table:
library(rtables)
lyt <- basic_table() %>%
split_cols_by("color") %>%
split_rows_by("vehicle_class") %>%
split_rows_by("vehicle_type") %>%
analyze("cost")
build_table(lyt, vehic_data)
#> black white red
#> ââââââââââââââââââââââââââââââââââââââââââââââââ
#> auto
#> car
#> Mean 40431.92 40518.92 38713.14
#> truck
#> Mean 40061.70 40635.74 40024.41
#> suv
#> Mean NA NA NA
#> sailboat
#> Mean NA NA NA
#> cruiseliner
#> Mean NA NA NA
#> boat
#> car
#> Mean NA NA NA
#> truck
#> Mean NA NA NA
#> suv
#> Mean NA NA NA
#> sailboat
#> Mean 99349.69 99996.54 101865.73
#> cruiseliner
#> Mean 100212.00 99340.25 100363.52
This is obviously not the table we want, as the majority of its space is taken up by meaningless combinations. If we use trim_levels_in_group
to trim the levels of vehicle_type
separately within each level of vehicle_class
, we get a table which only has meaningful combinations:
lyt2 <- basic_table() %>%
split_cols_by("color") %>%
split_rows_by("vehicle_class", split_fun = trim_levels_in_group("vehicle_type")) %>%
split_rows_by("vehicle_type") %>%
analyze("cost")
build_table(lyt2, vehic_data)
#> black white red
#> ââââââââââââââââââââââââââââââââââââââââââââââââ
#> auto
#> car
#> Mean 40431.92 40518.92 38713.14
#> truck
#> Mean 40061.70 40635.74 40024.41
#> boat
#> sailboat
#> Mean 99349.69 99996.54 101865.73
#> cruiseliner
#> Mean 100212.00 99340.25 100363.52
Note, however, that it does not contain all meaningful combinations, only those that were actually observed in our data; which happens to not include the perfectly valid "auto"
, "suv"
combination.
To restrict level combinations to those which are valid regardless of whether the combination was observed, we must use trim_levels_to_map()
instead.
trim_levels_to_map
trim_levels_to_map
is similar to trim_levels_in_group
in that its purpose is to avoid combinatorial explosion when nesting splitting with logically nested variables. Unlike its sibling function, however, with trim_levels_to_map
we define the exact set of allowed combinations a priori, and that exact set of combinations is produced in the resulting table, regardless of whether they are observed or not.
library(tibble)
map <- tribble(
~vehicle_class, ~vehicle_type,
"auto", "truck",
"auto", "suv",
"auto", "car",
"boat", "sailboat",
"boat", "cruiseliner"
)
lyt3 <- basic_table() %>%
split_cols_by("color") %>%
split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
split_rows_by("vehicle_type") %>%
analyze("cost")
build_table(lyt3, vehic_data)
#> black white red
#> ââââââââââââââââââââââââââââââââââââââââââââââââ
#> auto
#> car
#> Mean 40431.92 40518.92 38713.14
#> truck
#> Mean 40061.70 40635.74 40024.41
#> suv
#> Mean NA NA NA
#> boat
#> sailboat
#> Mean 99349.69 99996.54 101865.73
#> cruiseliner
#> Mean 100212.00 99340.25 100363.52
Now we see that the "auto"
, "suv"
combination is again present, even though it is populated with NA
s (because there is no data in that category), but the logically invalid combinations are still absent.
Another very common manipulation of faceting in a table context is the introduction of combination levels that are not explicitly modeled in the data. Most often, this involves the addition of an âoverallâ category, but in both principle and practice it can involve any arbitrary combination of levels.
rtables
explicitly supports this via the add_overall_level
(for the all case) and add_combo_levels
split function factories.
add_overall_level
add_overall_level
accepts valname
which is the name of the new level, as well as label
, and first
(whether it should come first, if TRUE
, or last, if FALSE
, in the ordering).
Building further on our arbitrary vehicles table, we can use this to create an âall colorsâ category:
lyt4 <- basic_table(show_colcounts = TRUE) %>%
split_cols_by("color", split_fun = add_overall_level("allcolors", label = "All Colors")) %>%
split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
split_rows_by("vehicle_type") %>%
analyze("cost")
build_table(lyt4, vehic_data)
#> All Colors black white red
#> (N=1000) (N=521) (N=251) (N=228)
#> âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
#> auto
#> car
#> Mean 40095.49 40431.92 40518.92 38713.14
#> truck
#> Mean 40194.68 40061.70 40635.74 40024.41
#> suv
#> Mean NA NA NA NA
#> boat
#> sailboat
#> Mean 100133.22 99349.69 99996.54 101865.73
#> cruiseliner
#> Mean 100036.76 100212.00 99340.25 100363.52
With the column counts turned on, we can see that the âAll Colorsâ column encompasses the full 1000 (completely fake) vehicles in our data set.
To add more arbitrary combinations, we use add_combo_levels
.
add_combo_levels
add_combo_levels
allows us to add one or more arbitrary combination levels to the faceting structure of our table.
We do this by defining a combination data.frame which describes the levels we want to add. A combination data.frame
has the following columns and one row for each combination to add:
valname
- string indicating the name of the value, which will appear in paths.label
- a string indicating the label which should be displayed when rendering.levelcombo
- character vector of the individual levels to be combined in this combination level.exargs
- a list (usually list()
) of extra arguments which should be passed to analysis and content functions when tabulated within this column or row.Suppose we wanted combinations levels for all non-white colors, and for white and black colors. We do this like so:
combodf <- tribble(
~valname, ~label, ~levelcombo, ~exargs,
"non-white", "Non-White", c("black", "red"), list(),
"blackwhite", "Black or White", c("black", "white"), list()
)
lyt5 <- basic_table(show_colcounts = TRUE) %>%
split_cols_by("color", split_fun = add_combo_levels(combodf)) %>%
split_rows_by("vehicle_class", split_fun = trim_levels_to_map(map)) %>%
split_rows_by("vehicle_type") %>%
analyze("cost")
build_table(lyt5, vehic_data)
#> black white red Non-White Black or White
#> (N=521) (N=251) (N=228) (N=749) (N=772)
#> âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
#> auto
#> car
#> Mean 40431.92 40518.92 38713.14 39944.93 40460.77
#> truck
#> Mean 40061.70 40635.74 40024.41 40050.66 40243.57
#> suv
#> Mean NA NA NA NA NA
#> boat
#> sailboat
#> Mean 99349.69 99996.54 101865.73 100179.72 99567.50
#> cruiseliner
#> Mean 100212.00 99340.25 100363.52 100258.56 99937.47
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4