RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://cran.r-project.org/web/packages/rms/../htmltools/../rtables/vignettes/sorting_pruning.html below:

Pruning and Sorting Tables

Sorting Fundamentals

Sorting of an rtables table is done at a path, meaning a sort operation will occur at a particular location within the table, and the direct children of the element at that path will be reordered. This occurs whether those children are subtables themselves, or individual rows. Sorting is done via the sort_at_path() function, which accepts both a (row) path and a scoring function. See the pathing vignette for details about paths.

A score function accepts a subtree or TableRow and returns a single orderable (typically numeric) value. Within the subtable currently being sorted, the children are then reordered by the value of the score function. Importantly, âcontentâ (ContentRow) and âvaluesâ (DataRow) need to be treated differently in the scoring function as they are retrieved: the content of a subtable is retrieved via the content _table accessor.

The cont_n_allcols() scoring function provided by rtables, works by scoring subtables by the sum of the first elements in the first row of the subtableâs content table. Note that this function fails if the child being scored does not have a content function (i.e., if summarize_row_groups() was not used at the corresponding point in the layout). We can see this in itâs definition, below:

cont_n_allcols
# function (tt) 
# {
#     ctab <- content_table(tt)
#     if (NROW(ctab) == 0) {
#         stop("cont_n_allcols score function used at subtable [", 
#             obj_name(tt), "] that has no content table.")
#     }
#     sum(sapply(row_values(tree_children(ctab)[[1]]), function(cv) cv[1]))
# }
# <bytecode: 0x561500f66540>
# <environment: namespace:rtables>

Therefore, a fundamental difference between pruning and sorting is that sorting occurs at particular places in the table, as defined by a path.

For example, we can sort the strata values (ContentRow) by observation counts within just the ASIAN subtable:

sort_at_path(pruned, path = c("RACE", "ASIAN", "STRATA1"), scorefun = cont_n_allcols)
#                                    A: Drug X                B: Placebo              C: Combination     
#                                 F            M            F            M            F            M     
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
#   A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
#   C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
#   B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
#     Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
# BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
#   A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
#   B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
#     Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
#   C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
#     Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
# WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
#   A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
#     Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
#   B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
#     Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
#   C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      35.50        44.00        44.67          NA         38.50        35.00
# B and C are swapped as the global count (sum of all column counts) of strata C is higher than the one of strata B

Wildcards in Sort Paths

A sorting path can contain one or more instances of the â*â wildcard. Each of these indicates that the children of each subtable matching this * element of the path should be sorted separately as indicated by the remainder of the path after the * and the score function.

Thus we can extend our sorting of strata within the ASIAN subtable to all race-specific subtables by using the wildcard:

sort_at_path(pruned, path = c("RACE", "*", "STRATA1"), scorefun = cont_n_allcols)
#                                    A: Drug X                B: Placebo              C: Combination     
#                                 F            M            F            M            F            M     
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
#   A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
#   C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
#   B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
#     Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
# BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
#   C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
#     Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
#   B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
#     Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
#   A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
# WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
#   B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
#     Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
#   A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
#     Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
#   C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      35.50        44.00        44.67          NA         38.50        35.00
# All subtables, i.e. ASIAN, BLACK..., and WHITE, are reordered separately

The above is equivalent to separately calling the following:

tmptbl <- sort_at_path(pruned, path = c("RACE", "ASIAN", "STRATA1"), scorefun = cont_n_allcols)
tmptbl <- sort_at_path(tmptbl, path = c("RACE", "BLACK OR AFRICAN AMERICAN", "STRATA1"), scorefun = cont_n_allcols)
tmptbl <- sort_at_path(tmptbl, path = c("RACE", "WHITE", "STRATA1"), scorefun = cont_n_allcols)
tmptbl
#                                    A: Drug X                B: Placebo              C: Combination     
#                                 F            M            F            M            F            M     
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
#   A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
#   C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
#   B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
#     Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
# BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
#   C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
#     Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
#   B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
#     Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
#   A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
# WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
#   B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
#     Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
#   A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
#     Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
#   C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      35.50        44.00        44.67          NA         38.50        35.00

It is possible to understand better pathing with table_structure() that highlights the tree-like structure and the node names:

table_structure(pruned)
# [TableTree] RACE
#  [TableTree] ASIAN [cont: 1 x 6]
#   [TableTree] STRATA1
#    [TableTree] A [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#    [TableTree] B [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#    [TableTree] C [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#  [TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 6]
#   [TableTree] STRATA1
#    [TableTree] A [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#    [TableTree] B [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#    [TableTree] C [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#  [TableTree] WHITE [cont: 1 x 6]
#   [TableTree] STRATA1
#    [TableTree] A [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#    [TableTree] B [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)
#    [TableTree] C [cont: 1 x 6]
#     [ElementaryTable] AGE (1 x 6)

or with row_paths_summary:

row_paths_summary(pruned)
# rowname                      node_class    path                                                                
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                        ContentRow    RACE, ASIAN, @content, ASIAN                                        
#   A                          ContentRow    RACE, ASIAN, STRATA1, A, @content, A                                
#     Mean                     DataRow       RACE, ASIAN, STRATA1, A, AGE, Mean                                  
#   B                          ContentRow    RACE, ASIAN, STRATA1, B, @content, B                                
#     Mean                     DataRow       RACE, ASIAN, STRATA1, B, AGE, Mean                                  
#   C                          ContentRow    RACE, ASIAN, STRATA1, C, @content, C                                
#     Mean                     DataRow       RACE, ASIAN, STRATA1, C, AGE, Mean                                  
# BLACK OR AFRICAN AMERICAN    ContentRow    RACE, BLACK OR AFRICAN AMERICAN, @content, BLACK OR AFRICAN AMERICAN
#   A                          ContentRow    RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A, @content, A            
#     Mean                     DataRow       RACE, BLACK OR AFRICAN AMERICAN, STRATA1, A, AGE, Mean              
#   B                          ContentRow    RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B, @content, B            
#     Mean                     DataRow       RACE, BLACK OR AFRICAN AMERICAN, STRATA1, B, AGE, Mean              
#   C                          ContentRow    RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C, @content, C            
#     Mean                     DataRow       RACE, BLACK OR AFRICAN AMERICAN, STRATA1, C, AGE, Mean              
# WHITE                        ContentRow    RACE, WHITE, @content, WHITE                                        
#   A                          ContentRow    RACE, WHITE, STRATA1, A, @content, A                                
#     Mean                     DataRow       RACE, WHITE, STRATA1, A, AGE, Mean                                  
#   B                          ContentRow    RACE, WHITE, STRATA1, B, @content, B                                
#     Mean                     DataRow       RACE, WHITE, STRATA1, B, AGE, Mean                                  
#   C                          ContentRow    RACE, WHITE, STRATA1, C, @content, C                                
#     Mean                     DataRow       RACE, WHITE, STRATA1, C, AGE, Mean

Note in the latter we see content rows as those with paths following @content, e.g., ASIAN, @content, ASIAN. The first of these at a given path (i.e., <path>, @content, <> are the rows which will be used by the scoring functions which begin with cont_.

We can directly sort the ethnicity by observations in increasing order:

ethsort <- sort_at_path(pruned, path = c("RACE"), scorefun = cont_n_allcols, decreasing = FALSE)
ethsort
#                                    A: Drug X                B: Placebo              C: Combination     
#                                 F            M            F            M            F            M     
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
#   A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
#     Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
#   B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
#     Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
#   C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      35.50        44.00        44.67          NA         38.50        35.00   
# BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
#   A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
#   B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
#     Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
#   C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
#     Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
# ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
#   A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
#   B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
#     Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
#   C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      36.92        35.60        34.00        31.89        33.47        31.38

Within each ethnicity separately, sort the strata by number of females in arm C (i.e.Â column position 5):

sort_at_path(pruned, path = c("RACE", "*", "STRATA1"), cont_n_onecol(5))
#                                    A: Drug X                B: Placebo              C: Combination     
#                                 F            M            F            M            F            M     
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
#   A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
#   C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
#     Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
#   B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
#     Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
# BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
#   B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
#     Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
#   A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
#   C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
#     Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
# WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
#   C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
#     Mean                      35.50        44.00        44.67          NA         38.50        35.00   
#   B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
#     Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
#   A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
#     Mean                      34.00        45.00        29.33        33.33        35.00        32.80

Sorting Within an Analysis Subtable

When sorting within an analysis subtable (e.g., the subtable generated when your analysis function generates more than one row per group of data), the name of that subtable (generally the name of the variable being analyzed) must appear in the path, even if the variable label is not displayed when the table is printed.

To show the differences between sorting an analysis subtable (DataRow), and a content subtable (ContentRow), we modify and prune (as before) a similar raw table as before:

more_analysis_fnc <- function(x) {
  in_rows(
    "median" = median(x),
    "mean" = mean(x),
    .formats = "xx.x"
  )
}

raw_lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by(
    "RACE",
    split_fun = drop_and_remove_levels("WHITE") # dropping WHITE levels
  ) %>%
  summarize_row_groups() %>%
  split_rows_by("STRATA1") %>%
  summarize_row_groups() %>%
  analyze("AGE", afun = more_analysis_fnc)

tbl <- build_table(raw_lyt, DM) %>%
  prune_table() %>%
  print()
#                             A: Drug X    B: Placebo   C: Combination
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       79 (65.3%)   68 (64.2%)     84 (65.1%)  
#   A                         27 (22.3%)   20 (18.9%)     31 (24.0%)  
#     median                     30.0         33.0           36.0     
#     mean                       32.2         33.9           36.8     
#   B                         24 (19.8%)   29 (27.4%)     22 (17.1%)  
#     median                     32.5         32.0           34.0     
#     mean                       34.1         31.6           34.7     
#   C                         28 (23.1%)   19 (17.9%)     31 (24.0%)  
#     median                     36.5         34.0           33.0     
#     mean                       36.2         33.0           32.4     
# BLACK OR AFRICAN AMERICAN   28 (23.1%)   24 (22.6%)     27 (20.9%)  
#   A                          6 (5.0%)     7 (6.6%)       8 (6.2%)   
#     median                     32.0         29.0           32.5     
#     mean                       31.5         28.6           33.6     
#   B                         10 (8.3%)     6 (5.7%)      12 (9.3%)   
#     median                     33.0         30.0           33.5     
#     mean                       35.6         30.8           33.7     
#   C                         12 (9.9%)    11 (10.4%)      7 (5.4%)   
#     median                     33.0         36.0           32.0     
#     mean                       35.5         34.2           35.0

What should we do now if we want to sort each median and mean in each of the strata variables? We need to write a custom score function as the ready-made ones at the moment work only with content nodes (content_table() access function for cont_n_allcols() and cont_n_onecol(), of which we will talk in a moment). But before that, we need to think about what are we ordering, i.e.Â we need to specify the right path. We suggest looking at the structure first with table_structure() or row_paths_summary().

table_structure(tbl) # Direct inspection into the tree-like structure of rtables
# [TableTree] RACE
#  [TableTree] ASIAN [cont: 1 x 3]
#   [TableTree] STRATA1
#    [TableTree] A [cont: 1 x 3]
#     [ElementaryTable] AGE (2 x 3)
#    [TableTree] B [cont: 1 x 3]
#     [ElementaryTable] AGE (2 x 3)
#    [TableTree] C [cont: 1 x 3]
#     [ElementaryTable] AGE (2 x 3)
#  [TableTree] BLACK OR AFRICAN AMERICAN [cont: 1 x 3]
#   [TableTree] STRATA1
#    [TableTree] A [cont: 1 x 3]
#     [ElementaryTable] AGE (2 x 3)
#    [TableTree] B [cont: 1 x 3]
#     [ElementaryTable] AGE (2 x 3)
#    [TableTree] C [cont: 1 x 3]
#     [ElementaryTable] AGE (2 x 3)

We see that to order all of the AGE nodes we need to get there with something like this: RACE, ASIAN, STRATA1, A, AGE and no more as the next level is what we need to sort. But we see now that this path would sort only the first group. We need wildcards: RACE, *, STRATA1, *, AGE.

Now, we have found a way to select relevant paths that we want to sort. We want to construct a scoring function that works on the median and mean and sort them. To do so, we may want to enter our scoring function with browser() to see what is fed to it and try to retrieve the single value that is to be returned to do the sorting. We allow the user to experiment with this, while here we show a possible solution that considers summing all the column values that are retrieved with row_values(tt) from the subtable that is fed to the function itself. Note that any score function should be defined as having a subtable tt as a unique input parameter and a single numeric value as output.

scorefun <- function(tt) {
  # Here we could use browser()
  sum(unlist(row_values(tt)))
}
sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun)
#                             A: Drug X    B: Placebo   C: Combination
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       79 (65.3%)   68 (64.2%)     84 (65.1%)  
#   A                         27 (22.3%)   20 (18.9%)     31 (24.0%)  
#     mean                       32.2         33.9           36.8     
#     median                     30.0         33.0           36.0     
#   B                         24 (19.8%)   29 (27.4%)     22 (17.1%)  
#     mean                       34.1         31.6           34.7     
#     median                     32.5         32.0           34.0     
#   C                         28 (23.1%)   19 (17.9%)     31 (24.0%)  
#     median                     36.5         34.0           33.0     
#     mean                       36.2         33.0           32.4     
# BLACK OR AFRICAN AMERICAN   28 (23.1%)   24 (22.6%)     27 (20.9%)  
#   A                          6 (5.0%)     7 (6.6%)       8 (6.2%)   
#     mean                       31.5         28.6           33.6     
#     median                     32.0         29.0           32.5     
#   B                         10 (8.3%)     6 (5.7%)      12 (9.3%)   
#     mean                       35.6         30.8           33.7     
#     median                     33.0         30.0           33.5     
#   C                         12 (9.9%)    11 (10.4%)      7 (5.4%)   
#     mean                       35.5         34.2           35.0     
#     median                     33.0         36.0           32.0

To help the user visualize what is happening in the score function we show here an example of its exploration from the debugging:

> sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun)
Called from: scorefun(x)
Browse[1]> tt ### THIS IS THE LEAF LEVEL -> DataRow ###
[DataRow indent_mod 0]: median   30.0   33.0   36.0
Browse[1]> row_values(tt) ### Extraction of values -> It will be a named list! ###
$`A: Drug X`
[1] 30

$`B: Placebo`
[1] 33

$`C: Combination`
[1] 36

Browse[1]> sum(unlist(row_values(tt))) ### Final value we want to give back to sort_at_path ###
[1] 99

We can see how powerful and pragmatic it might be to change the sorting principles from within the custom scoring function. We show this by selecting a specific column to sort. Looking at the pre-defined function cont_n_onecol() gives us an insight into how to proceed.

cont_n_onecol
# function (j) 
# {
#     function(tt) {
#         ctab <- content_table(tt)
#         if (NROW(ctab) == 0) {
#             stop("cont_n_allcols score function used at subtable [", 
#                 obj_name(tt), "] that has no content table.")
#         }
#         row_values(tree_children(ctab)[[1]])[[j]][1]
#     }
# }
# <bytecode: 0x561500c87850>
# <environment: namespace:rtables>

We see that a similar function to cont_n_allcols() is wrapped by one that allows a parameter j to be used to select a specific column. We will do the same here for selecting which column we want to sort.

scorefun_onecol <- function(colpath) {
  function(tt) {
    # Here we could use browser()
    unlist(cell_values(tt, colpath = colpath), use.names = FALSE)[1] # Modified to lose the list names
  }
}
sort_at_path(tbl, c("RACE", "*", "STRATA1", "*", "AGE"), scorefun_onecol(colpath = c("ARM", "A: Drug X")))
#                             A: Drug X    B: Placebo   C: Combination
# ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# ASIAN                       79 (65.3%)   68 (64.2%)     84 (65.1%)  
#   A                         27 (22.3%)   20 (18.9%)     31 (24.0%)  
#     mean                       32.2         33.9           36.8     
#     median                     30.0         33.0           36.0     
#   B                         24 (19.8%)   29 (27.4%)     22 (17.1%)  
#     mean                       34.1         31.6           34.7     
#     median                     32.5         32.0           34.0     
#   C                         28 (23.1%)   19 (17.9%)     31 (24.0%)  
#     median                     36.5         34.0           33.0     
#     mean                       36.2         33.0           32.4     
# BLACK OR AFRICAN AMERICAN   28 (23.1%)   24 (22.6%)     27 (20.9%)  
#   A                          6 (5.0%)     7 (6.6%)       8 (6.2%)   
#     median                     32.0         29.0           32.5     
#     mean                       31.5         28.6           33.6     
#   B                         10 (8.3%)     6 (5.7%)      12 (9.3%)   
#     mean                       35.6         30.8           33.7     
#     median                     33.0         30.0           33.5     
#   C                         12 (9.9%)    11 (10.4%)      7 (5.4%)   
#     mean                       35.5         34.2           35.0     
#     median                     33.0         36.0           32.0

In the above table we see that the mean and median rows are reordered by their values in the first column, compared to the raw table, as desired.

With this function we can also do the same for columns that are nested within larger splits:

# Simpler table
tbl <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_cols_by("SEX",
    split_fun = drop_and_remove_levels(c("U", "UNDIFFERENTIATED"))
  ) %>%
  analyze("AGE", afun = more_analysis_fnc) %>%
  build_table(DM) %>%
  prune_table() %>%
  print()
#           A: Drug X      B: Placebo      C: Combination  
#            F      M       F       M        F         M   
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# median   32.0    35.0   33.0    31.0     35.0      32.0  
# mean     33.7    36.5   33.8    32.1     34.9      34.3

sort_at_path(tbl, c("AGE"), scorefun_onecol(colpath = c("ARM", "B: Placebo", "SEX", "F")))
#           A: Drug X      B: Placebo      C: Combination  
#            F      M       F       M        F         M   
# âââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
# mean     33.7    36.5   33.8    32.1     34.9      34.3  
# median   32.0    35.0   33.0    31.0     35.0      32.0

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4