The manual pages for the top_n
function do not include any examples with date values and trying to pick out the earliest/latest of a period can be confusing. For example, I worked in insurance so we had eligibility periods that ran from startdate to enddate.
To get the earliest startdate, as a prior SQL programmer, I would expect to use an ascending list and the top item on the list is the first one. However, top_n provides the "largest" date i.e. the last one.
The ordering of an ascending list should return as the top the first item in the list. However, top_n returns the largest value, not the smallest. This can be seen in the example below. I am also porting the data over to SQL so you can see how this ascending order of lists, limit to the first item 1 returns differently there (in many SQL variants SELECT TOP # is supported but not SQLite).
Reproducible Example:
library(tidyverse) example <- data.frame( startdate = seq(as.Date("2019/01/01"), as.Date("2019/12/31"), by="days"), enddate = seq(as.Date("2021/01/01"), as.Date("2021/12/31"), by="days") ) example[1:5, ] ###erroneous result example %>% top_n( 1, startdate) #2019-12-31 example %>% select( startdate ) %>% arrange( startdate ) %>% top_n( 1 ) #2019-12-31 ###desired solution example %>% summarize( output = min(startdate) ) #2019-01-01 example %>% top_n( -1, startdate ) library(DBI) db=dbConnect(RSQLite::SQLite(), ":memory:") dbWriteTable( db, "example", example)
SQL Snippet to do the same thing
SELECT startdate
FROM example
ORDER BY startdate
LIMIT 1
*2019-01-01
The same is true of the reverse, if you are obtaining end-date you would use a descending list from oldest to newest and pull the first item, but this pulls the "smallest" i.e. the "earliest" item.
However, coming from a SQL background this is counter-intuitive where I would normally query such as this:
SELECT top 1 id, startdate FROM x GROUP BY id ORDER BY startdate SELECT top 1 id, enddate FROM x GROUP BY id ORDER BY desc(enddate)
Or alternatively, and easier if not looking for a matched set...
SELECT id, min(startdate), max(enddate) FROM x GROUP BY id
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4