While get_dset()
is a quick way to retrieve an entire data set and metadata, the get_data()
function is a generalisation: it can also be used to obtain a whole data set, but also subsets of data, based on e.g. indicator selection and grouping (columns), as well as unit selection and grouping (rows).
A simple example is to extract one or more named indicators from a target data set:
x <- get_data(coin, dset = "Raw", iCodes = c("Flights", "LPI"))
# see first few rows
head(x, 5)
#> uCode Flights LPI
#> 31 AUS 36.05498 3.793385
#> 1 AUT 29.01725 4.097985
#> 2 BEL 31.88546 4.108538
#> 32 BGD 4.27955 2.663902
#> 3 BGR 9.23588 2.807685
By default, get_data()
returns the requested indicators, plus the uCode
identifier column. We can also set also_get = "none"
to return only the indicator columns.
The iCode
argument can also accept groups of indicators, based on the structure of the index. In our example, indicators are aggregated into âpillarsâ (level 2) within groups. We can name an aggregation group and extract the underlying indicators:
x <- get_data(coin, dset = "Raw", iCodes = "Political", Level = 1)
head(x, 5)
#> uCode Embs IGOs UNVote
#> 31 AUS 82 196 38.46245
#> 1 AUT 88 227 42.63920
#> 2 BEL 84 248 43.00308
#> 32 BGD 52 145 38.60601
#> 3 BGR 67 209 42.95986
Here we have requested all the indicators in level 1 (the indicator level), that belong to the group called âPoliticalâ (one of the pillars). Specifying the level becomes more relevant when we look at the aggregated data set, which also includes the pillar, sub-index and index scores. Here, for example, we can ask for all the pillar scores (level 2) which belong to the sustainability sub-index (level 3):
x <- get_data(coin, dset = "Aggregated", iCodes = "Sust", Level = 2)
head(x, 5)
#> uCode Environ Social SusEcFin
#> 1 AUS 31.92211 71.88108 55.69987
#> 2 AUT 69.47511 72.76415 62.88150
#> 3 BEL 53.00859 86.16783 50.09020
#> 4 BGD 81.66988 27.51138 64.58884
#> 5 BGR 55.69922 53.30489 61.68677
If this isnât clear, look at the structure of the example index using e.g. plot_framework(coin)
. If we wanted to select all the indicators within the âSustâ sub-index we would set Level = 1
. If we wanted to select the sub-index scores themselves we would set Level = 3
, and so on.
The idea of selecting indicators and aggregates based on the structure of the index is useful in many places in COINr, for example examining correlations within aggregation groups using plot_corr()
.
Units (rows) of the data set can also be selected (also in combination with selecting indicators). Starting with a simple example, letâs select specified units for a specific indicator:
get_data(coin, dset = "Raw", iCodes = "Goods", uCodes = c("AUT", "VNM"))
#> uCode Goods
#> 1 AUT 278.4264
#> 51 VNM 269.0766
Rows can also be sub-setted using groups, i.e. unit groupings that are defined using variables input with iMeta$Type = "Group"
when building the coin. Recall that for our example coin we have several groups (a reminder that you can see some details about the coin using its print method):
coin
#> --------------
#> A coin with...
#> --------------
#> Input:
#> Units: 51 (AUS, AUT, BEL, ...)
#> Indicators: 49 (Goods, Services, FDI, ...)
#> Denominators: 4 (Area, Energy, GDP, ...)
#> Groups: 4 (GDP_group, GDPpc_group, Pop_group, ...)
#>
#> Structure:
#> Level 1 Indicator: 49 indicators (FDI, ForPort, Goods, ...)
#> Level 2 Pillar: 8 groups (ConEcFin, Instit, P2P, ...)
#> Level 3 Sub-index: 2 groups (Conn, Sust)
#> Level 4 Index: 1 groups (Index)
#>
#> Data sets:
#> Raw (51 units)
#> Denominated (51 units)
#> Imputed (51 units)
#> Screened (51 units)
#> Treated (51 units)
#> Normalised (51 units)
#> Aggregated (51 units)
The first way to subset by unit group is to name a grouping variable, and a group within that variable to select. For example, say we want to know the values of the âGoodsâ indicator for all the countries in the âXLâ GDP group:
get_data(coin, dset = "Raw", iCodes = "Goods", use_group = list(GDP_group = "XL"))
#> uCode GDP_group Goods
#> 1 AUS XL 288.4893
#> 8 CHN XL 1713.6190
#> 11 DEU XL 1919.1940
#> 13 ESP XL 447.1229
#> 16 FRA XL 849.3303
#> 17 GBR XL 778.9052
#> 21 IDN XL 222.4186
#> 22 IND XL 288.9806
#> 24 ITA XL 658.1981
#> 25 JPN XL 732.2078
#> 28 KOR XL 568.9920
#> 45 RUS XL 343.8504
Since we have subsetted by group, this also returns the group column which was used.
Another way of sub-setting is to combine uCodes
and use_group
. When these two arguments are both specified, the result is to return the full group(s) to which the specified uCodes
belong. This can be used to put a unit in context with its peers within a group. For example, we might want to see the values of the âFlightsâ indicator for a specific unit, as well as all other units within the same population group:
get_data(coin, dset = "Raw", iCodes = "Flights", uCodes = "MLT", use_group = "Pop_group")
#> uCode Pop_group Flights
#> 6 BRN S 2.01900
#> 9 CYP S 8.75467
#> 14 EST S 3.12946
#> 19 HRV S 9.24529
#> 23 IRL S 34.17721
#> 30 LTU S 5.37919
#> 31 LUX S 4.84458
#> 32 LVA S 6.77976
#> 33 MLT S 6.75251
#> 35 MNG S 0.98951
#> 38 NOR S 25.64994
#> 39 NZL S 13.37242
#> 48 SVN S 1.51736
Here, we have to specify use_group
simply as a string rather than a list. Since MLT is in the âSâ population group, it returns all units within that group.
Overall, the idea of get_data()
is to flexibly return subsets of indicator data, based on the structure of the index and unit groups.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4