A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cran.rstudio.com/web/packages/rlang/../COINr/vignettes/data_selection.html below:

Data selection

While get_dset() is a quick way to retrieve an entire data set and metadata, the get_data() function is a generalisation: it can also be used to obtain a whole data set, but also subsets of data, based on e.g. indicator selection and grouping (columns), as well as unit selection and grouping (rows).

Indicators/columns

A simple example is to extract one or more named indicators from a target data set:

x <- get_data(coin, dset = "Raw", iCodes = c("Flights", "LPI"))

# see first few rows
head(x, 5)
#>    uCode  Flights      LPI
#> 31   AUS 36.05498 3.793385
#> 1    AUT 29.01725 4.097985
#> 2    BEL 31.88546 4.108538
#> 32   BGD  4.27955 2.663902
#> 3    BGR  9.23588 2.807685

By default, get_data() returns the requested indicators, plus the uCode identifier column. We can also set also_get = "none" to return only the indicator columns.

The iCode argument can also accept groups of indicators, based on the structure of the index. In our example, indicators are aggregated into “pillars” (level 2) within groups. We can name an aggregation group and extract the underlying indicators:

x <- get_data(coin, dset = "Raw", iCodes = "Political", Level = 1)
head(x, 5)
#>    uCode Embs IGOs   UNVote
#> 31   AUS   82  196 38.46245
#> 1    AUT   88  227 42.63920
#> 2    BEL   84  248 43.00308
#> 32   BGD   52  145 38.60601
#> 3    BGR   67  209 42.95986

Here we have requested all the indicators in level 1 (the indicator level), that belong to the group called “Political” (one of the pillars). Specifying the level becomes more relevant when we look at the aggregated data set, which also includes the pillar, sub-index and index scores. Here, for example, we can ask for all the pillar scores (level 2) which belong to the sustainability sub-index (level 3):

x <- get_data(coin, dset = "Aggregated", iCodes = "Sust", Level = 2)

head(x, 5)
#>   uCode  Environ   Social SusEcFin
#> 1   AUS 31.92211 71.88108 55.69987
#> 2   AUT 69.47511 72.76415 62.88150
#> 3   BEL 53.00859 86.16783 50.09020
#> 4   BGD 81.66988 27.51138 64.58884
#> 5   BGR 55.69922 53.30489 61.68677

If this isn’t clear, look at the structure of the example index using e.g. plot_framework(coin). If we wanted to select all the indicators within the “Sust” sub-index we would set Level = 1. If we wanted to select the sub-index scores themselves we would set Level = 3, and so on.

The idea of selecting indicators and aggregates based on the structure of the index is useful in many places in COINr, for example examining correlations within aggregation groups using plot_corr().

Units/rows

Units (rows) of the data set can also be selected (also in combination with selecting indicators). Starting with a simple example, let’s select specified units for a specific indicator:

get_data(coin, dset = "Raw", iCodes = "Goods", uCodes = c("AUT", "VNM"))
#>    uCode    Goods
#> 1    AUT 278.4264
#> 51   VNM 269.0766

Rows can also be sub-setted using groups, i.e. unit groupings that are defined using variables input with iMeta$Type = "Group" when building the coin. Recall that for our example coin we have several groups (a reminder that you can see some details about the coin using its print method):

coin
#> --------------
#> A coin with...
#> --------------
#> Input:
#>   Units: 51 (AUS, AUT, BEL, ...)
#>   Indicators: 49 (Goods, Services, FDI, ...)
#>   Denominators: 4 (Area, Energy, GDP, ...)
#>   Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)
#> 
#> Structure:
#>   Level 1 Indicator: 49 indicators (FDI, ForPort, Goods, ...) 
#>   Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...) 
#>   Level 3 Sub-index: 2 groups (Conn, Sust) 
#>   Level 4 Index: 1 groups (Index) 
#> 
#> Data sets:
#>   Raw (51 units)
#>   Denominated (51 units)
#>   Imputed (51 units)
#>   Screened (51 units)
#>   Treated (51 units)
#>   Normalised (51 units)
#>   Aggregated (51 units)

The first way to subset by unit group is to name a grouping variable, and a group within that variable to select. For example, say we want to know the values of the “Goods” indicator for all the countries in the “XL” GDP group:

get_data(coin, dset = "Raw", iCodes = "Goods", use_group = list(GDP_group = "XL"))
#>    uCode GDP_group     Goods
#> 1    AUS        XL  288.4893
#> 8    CHN        XL 1713.6190
#> 11   DEU        XL 1919.1940
#> 13   ESP        XL  447.1229
#> 16   FRA        XL  849.3303
#> 17   GBR        XL  778.9052
#> 21   IDN        XL  222.4186
#> 22   IND        XL  288.9806
#> 24   ITA        XL  658.1981
#> 25   JPN        XL  732.2078
#> 28   KOR        XL  568.9920
#> 45   RUS        XL  343.8504

Since we have subsetted by group, this also returns the group column which was used.

Another way of sub-setting is to combine uCodes and use_group. When these two arguments are both specified, the result is to return the full group(s) to which the specified uCodes belong. This can be used to put a unit in context with its peers within a group. For example, we might want to see the values of the “Flights” indicator for a specific unit, as well as all other units within the same population group:

get_data(coin, dset = "Raw", iCodes = "Flights", uCodes = "MLT", use_group = "Pop_group")
#>    uCode Pop_group  Flights
#> 6    BRN         S  2.01900
#> 9    CYP         S  8.75467
#> 14   EST         S  3.12946
#> 19   HRV         S  9.24529
#> 23   IRL         S 34.17721
#> 30   LTU         S  5.37919
#> 31   LUX         S  4.84458
#> 32   LVA         S  6.77976
#> 33   MLT         S  6.75251
#> 35   MNG         S  0.98951
#> 38   NOR         S 25.64994
#> 39   NZL         S 13.37242
#> 48   SVN         S  1.51736

Here, we have to specify use_group simply as a string rather than a list. Since MLT is in the “S” population group, it returns all units within that group.

Overall, the idea of get_data() is to flexibly return subsets of indicator data, based on the structure of the index and unit groups.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4