A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://randyzwitch.com/sessionizing-log-data-using-data-table-follow-2/ below:

Sessionizing Log Data Using data.table [Follow-up #2]

Thanks to user dnlbrky, we now have a third way to accomplish sessionizing log data for any arbitrary time out period (see methods 1 and 2), this time using data.table from R along with magrittr for piping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
library(magrittr)
library(data.table)

## Download, unzip, and load data (first 10,000 lines):
single_col_timestamp <- url("http://randyzwitch.com/wp-content/uploads/2015/01/single_col_timestamp.csv.gz") %>%
  gzcon %>%
  readLines(n=10000L) %>%
  textConnection %>%
  read.csv %>%
  setDT

## Convert to timestamp:
single_col_timestamp[, event_timestamp:=as.POSIXct(event_timestamp)]

## Order by uid and event_timestamp:
setkey(single_col_timestamp, uid, event_timestamp)

## Sessionize the data (more than 30 minutes between events is a new session):
single_col_timestamp[, session_id:=paste(uid, cumsum((c(0, diff(event_timestamp))/60 > 30)*1), sep="_"), by=uid]

## Examine the results:
#single_col_timestamp[uid %like% "a55bb9"]
single_col_timestamp[session_id %like% "fc895c3babd"]

I agree with dnlbrky in that this feels a little better than the dplyr method for heavy SQL users like me, but ultimately, I still think the SQL method is the most elegant and obvious to understand. But that’s the great thing with open-source software; pick any tool you want, accomplish whatever you choose using any method you choose.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4