Constructing time series for Germanyâs municipalities and districts is often challenging as borders of these units change over time. Some of todayâs districts are the product of multiple districts that merged and got split up in the past (the same applies to municipalities but to a lesser extent). When district and municipality borders change significantly, the district and municipality identifiers (Amtlicher Gemeindeschlüssel, AGS) change too.
To compile long time series for municipalities and districts, the Federal Office for Building and Regional Planning (Bundesinstitut für Bau-, Stadt- und Raumforschung, BBSR) has constructed annual crosswalks starting in 1990. These crosswalks map one yearâs set of district (municipality) identifiers to later yearâs identifiers and provide weights to perform area (or population) weighted interpolation for count data.
Amtlicher GemeindeschlüsselThe Amtlicher Gemeindeschlüssel (AGS) is a 8-digit code to identify a municipality (Gemeinde) and other regional units. The AGS has a hierarchical structure with the following components:
For example, the municipality Dierfeld in Rheinland-Pfalz has the AGS 07231021:
The five-digit version is sometimes referred to as Kreisschlüssel (district code) and the first two digits as Bundeslandschüssel (Land code). The Bundeslandschüssel are as follows:
Area and population weighted interpolation becomes necessary when the territory of a unit (say A) is allocated to multiple other units (say C and D). While area weighted interpolation allocates Aâs count proportionally to the area that goes from A to C and from A to D, population weighted interpolation uses the relative population size of A that goes to C and D. If all of Aâs territory or population goes to either C or D, the interpolation weight is 1.
Area and population weighted interpolation is simple, but the estimates come with some estimation error. This error is larger the more the underlying data are not distributed uniformly within unit A. Fortunately, in the majority of the territorial reforms since 1990, units are allocated to another unit as a whole. In these instances, there is no estimation error.
The figure below shows the annual percentage share of todayâs districts (n=401) and municipalities (n=11007) for which spatial interpolation is necessary to construct a time series. More specifically, the figure shows the percentage share of units for which the average (area-based) interpolation weight is not exactly 1 in every year since 1990.
UsageThe packageâs main function is xwalk_ags
. To illustrate how this function works, we use federal election data from Saxony that are part of the package. The dataset includes the number of voters and valid votes on the district level between 1994 and 2017. The figure below shows turnout trends based on the data. There is a break in the time series between 2005 and 2009 when the number of districts in Saxony declined from 29 and 13 and all district identifiers changed.
data(btw_sn)
ggplot(btw_sn, aes(year, (valid/voters)*(100), group=district)) +
geom_line() + geom_point() + ylab("Turnout (in %)") + xlab("Year")
To construct continuous time series on the number of voters and votes, we use the function xwalk_ags
. It takes the input data
, joins the crosswalk xwalk
with the data by ags
and time
and then proceeds with the area or population interpolation depending on the parameter weight
. If variables
are left undefined the merged dataset will be returned instead.
btw_sn_ags20 <- xwalk_ags(
data=btw_sn,
ags="district",
time="year",
xwalk="xd20",
variables=c("voters", "valid"),
weight="pop")
#>
#> Total number of obs: 155
#>
#> Excluded obs:
#> id/time NA AGS unk Year unk
#> 0 0 0
#>
#> Matched obs:
#> exact fuzzy
#> 126 NA
#>
#> Unmatched obs: 29
The output of xwalk_ags
provides information on the number of excluded observations as well as the number of matched and unmatched observations. Unmatched observations are the consequence of values in ags
that appear at least once in the crosswalk but not in the year when they are used in the data. More specifically, the 29 unmatched observations reported above are from the 1994 election which was preceded by a territorial reform in August 1994 reducing the number of districts from 54 to 29. The 29 new district identifiers only appear in the 1995 crosswalk for the first time.
To address this issue one can set fuzzy_time=TRUE
. With fuzzy matching enabled, the data and the crosswalk are matched exactly by ags
and as best as possible on time
. The output of the function call below suggests that there are no unmatched observations left then. The subsequent figure displays the turnout trends across todayâs district (\(n=13\)) and all seven elections.
btw_sn_ags20 <- xwalk_ags(
data=btw_sn,
ags="district",
time="year",
xwalk="xd20",
variables=c("voters", "valid"),
fuzzy_time=TRUE,
weight="pop")
#>
#> Total number of obs: 155
#>
#> Excluded obs:
#> id/time NA AGS unk Year unk
#> 0 0 0
#>
#> Matched obs:
#> exact fuzzy
#> 126 29
#>
#> Unmatched obs: 0
ggplot(btw_sn_ags20, aes(year, (valid/voters)*100, group=ags20)) +
geom_line() + geom_point() + ylab("Turnout (in %)") + xlab("Year")
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4