In a join, x[i, v := i.v]
, if multiple rows of i
match to a single row of x
, the assignment takes the last one (?). It would be nice to get an error or maybe a warning when this behavior is triggered.
library(data.table)
a <- data.table(id = c(1L, 1L, 2L, 3L, NA_integer_), x = 11:15)
b <- data.table(id = 1:2, y = -(1:2))
b[a, on=.(id), x := i.x, verbose = TRUE]
# Calculated ad hoc index in 0 secs
# Starting bmerge ...done in 0 secs
# Detected that j uses these columns: x,i.x
# Assigning to 3 row subset of 2 rows
I'm not sure if the condition in the title (n > m) is necessary and sufficient for this behavior, though.
My workaround for now would involve looking at the opposite join:
a[b, on=.(id), .N, by=.EACHI][, range(N)]
# [1] 1 2
That seems pretty cumbersome. Maybe there's some way for me to capture and grep the verbose output (but then again, maybe not).
Just an idea: A more general approach could involve returning an object containing diagnostics from the join and assignment. Of course, the object cannot be the return value of [.data.table
, but maybe it could be dropped in some locked-binding global, .datatable.diagnostic
similar to .Last.value
. Alternately, maybe that sort of object would fit well into @jangorecki 's dtq package.
I'm thinking along these lines as I write tutorial materials to convert Stata users to R. In Stata, all joins cat
a nice-ish table to the console.
SO post from a Stata user interested in uniqueness of matching of each row of i
in x
etc: https://stackoverflow.com/questions/49541330/r-data-table-merge-vs-stata-merge
Update: Re the verbose message text, the n is recorded thanks to #3460 and the m is just the number of rows in the table (which I guess I didn't realize at the time I posted this, thinking it was instead m = uniqueN(irows, nar.m = TRUE)
... which unfortunately is not computed, and there is no way to detect whether the update join was 1:1, etc per the SO link above).
So anyway, I'll leave this open since it seems to highlight a point of difficulty (judging by emoji-votes) even if my suggestion does not fix it.
jangorecki, Atrebas, raneameya and MichaelChirico
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4