3 回答

TA貢獻(xiàn)1845條經(jīng)驗(yàn) 獲得超8個(gè)贊
使用filter。(但是請(qǐng)注意,此答案不能產(chǎn)生正確的答案LEFT JOIN;但是MWE會(huì)給出正確的結(jié)果,INNER JOIN而帶有a 。)
dplyr如果要求合并兩個(gè)表而沒(méi)有要合并的內(nèi)容,則該程序包不滿意,因此在下面,我為此在兩個(gè)表中都創(chuàng)建了一個(gè)啞變量,然后進(jìn)行過(guò)濾,然后刪除dummy:
fdata %>%
mutate(dummy=TRUE) %>%
left_join(sdata %>% mutate(dummy=TRUE)) %>%
filter(fyear >= byear, fyear < eyear) %>%
select(-dummy)
并注意,如果您在PostgreSQL中進(jìn)行此操作(例如),查詢優(yōu)化器將通過(guò)dummy以下兩個(gè)查詢解釋來(lái)查看該變量:
> fdata %>%
+ mutate(dummy=TRUE) %>%
+ left_join(sdata %>% mutate(dummy=TRUE)) %>%
+ filter(fyear >= byear, fyear < eyear) %>%
+ select(-dummy) %>%
+ explain()
Joining by: "dummy"
<SQL>
SELECT "id" AS "id", "fyear" AS "fyear", "byear" AS "byear", "eyear" AS "eyear", "val" AS "val"
FROM (SELECT * FROM (SELECT "id", "fyear", TRUE AS "dummy"
FROM "fdata") AS "zzz136"
LEFT JOIN
(SELECT "byear", "eyear", "val", TRUE AS "dummy"
FROM "sdata") AS "zzz137"
USING ("dummy")) AS "zzz138"
WHERE "fyear" >= "byear" AND "fyear" < "eyear"
<PLAN>
Nested Loop (cost=0.00..50886.88 rows=322722 width=40)
Join Filter: ((fdata.fyear >= sdata.byear) AND (fdata.fyear < sdata.eyear))
-> Seq Scan on fdata (cost=0.00..28.50 rows=1850 width=16)
-> Materialize (cost=0.00..33.55 rows=1570 width=24)
-> Seq Scan on sdata (cost=0.00..25.70 rows=1570 width=24)
并使用SQL更干凈地進(jìn)行操作會(huì)得到完全相同的結(jié)果:
> tbl(pg, sql("
+ SELECT *
+ FROM fdata
+ LEFT JOIN sdata
+ ON fyear >= byear AND fyear < eyear")) %>%
+ explain()
<SQL>
SELECT "id", "fyear", "byear", "eyear", "val"
FROM (
SELECT *
FROM fdata
LEFT JOIN sdata
ON fyear >= byear AND fyear < eyear) AS "zzz140"
<PLAN>
Nested Loop Left Join (cost=0.00..50886.88 rows=322722 width=40)
Join Filter: ((fdata.fyear >= sdata.byear) AND (fdata.fyear < sdata.eyear))
-> Seq Scan on fdata (cost=0.00..28.50 rows=1850 width=16)
-> Materialize (cost=0.00..33.55 rows=1570 width=24)
-> Seq Scan on sdata (cost=0.00..25.70 rows=1570 width=24)

TA貢獻(xiàn)1871條經(jīng)驗(yàn) 獲得超13個(gè)贊
看起來(lái)這是打包Fuzzyjoin地址的任務(wù)。軟件包的各種功能與dplyr連接功能相似。
在這種情況下,其中一項(xiàng)fuzzy_*_join功能將為您服務(wù)。dplyr::left_join和之間的主要區(qū)別在于fuzzyjoin::fuzzy_left_join,您提供了在match.fun參數(shù)匹配過(guò)程中使用的函數(shù)列表。請(qǐng)注意,該by參數(shù)的寫(xiě)法仍然與相同left_join。
下面是一個(gè)例子。我使用的功能來(lái)匹配頃>=并<為fyear到byear和fyear到eyear的比較,分別。的
library(fuzzyjoin)
fuzzy_left_join(fdata, sdata,
by = c("fyear" = "byear", "fyear" = "eyear"),
match_fun = list(`>=`, `<`))
Source: local data frame [27 x 5]
id fyear byear eyear val
(dbl) (dbl) (dbl) (dbl) (dbl)
1 1 1998 1995 2000 1
2 1 1999 1995 2000 1
3 1 2000 2000 2005 5
4 1 2001 2000 2005 5
5 2 1998 1995 2000 1
6 2 1999 1995 2000 1
7 2 2000 2000 2005 5
8 2 2001 2000 2005 5
9 2 2002 2000 2005 5
10 2 2003 2000 2005 5
.. ... ... ... ... ...
添加回答
舉報(bào)