3 回答

TA貢獻(xiàn)1854條經(jīng)驗(yàn) 獲得超8個(gè)贊
來(lái)自?top_n,關(guān)于wt論點(diǎn):
用于排序[...] 的變量默認(rèn)為 tbl中的最后一個(gè)變量 “。
數(shù)據(jù)集中的最后一個(gè)變量是“grp”,它不是你想要排名的變量,這就是你的top_n嘗試“返回整個(gè)d”的原因。因此,如果您希望在數(shù)據(jù)集中按“x”排名,則需要指定wt = x。
set.seed(123)
d <- data.frame(
x = runif(90),
grp = gl(3, 30))
d %>%
group_by(grp) %>%
top_n(n = 5, wt = x)
# x grp
# 1 0.9404673 1
# 2 0.9568333 1
# 3 0.8998250 1
# 4 0.9545036 1
# 5 0.9942698 1
# 6 0.9630242 2
# 7 0.9022990 2
# 8 0.8578277 2
# 9 0.7989248 2
# 10 0.8950454 2
# 11 0.8146400 3
# 12 0.8123895 3
# 13 0.9849570 3
# 14 0.8930511 3
# 15 0.8864691 3

TA貢獻(xiàn)2065條經(jīng)驗(yàn) 獲得超14個(gè)贊
data.table太容易了......
library(data.table)
setorder(setDT(d), -x)[, head(.SD, 5), keyby = grp]
要么
setorder(setDT(d), grp, -x)[, head(.SD, 5), by = grp]
或者(對(duì)于大數(shù)據(jù)集應(yīng)該更快,因?yàn)楸苊庹{(diào)用.SD每個(gè)組)
setorder(setDT(d), grp, -x)[, indx := seq_len(.N), by = grp][indx <= 5]
編輯:這是dplyr比較data.table(如果有人感興趣)
set.seed(123)
d <- data.frame(
x = runif(1e6),
grp = sample(1e4, 1e6, TRUE))
library(dplyr)
library(microbenchmark)
library(data.table)
dd <- copy(d)
microbenchmark(
top_n = {d %>%
group_by(grp) %>%
top_n(n = 5, wt = x)},
dohead = {d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
do(head(., n = 5))},
slice = {d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
slice(1:5)},
filter = {d %>%
arrange(desc(x)) %>%
group_by(grp) %>%
filter(row_number() <= 5L)},
data.table1 = setorder(setDT(dd), -x)[, head(.SD, 5L), keyby = grp],
data.table2 = setorder(setDT(dd), grp, -x)[, head(.SD, 5L), grp],
data.table3 = setorder(setDT(dd), grp, -x)[, indx := seq_len(.N), grp][indx <= 5L],
times = 10,
unit = "relative"
)
# expr min lq mean median uq max neval
# top_n 24.246401 24.492972 16.300391 24.441351 11.749050 7.644748 10
# dohead 122.891381 120.329722 77.763843 115.621635 54.996588 34.114738 10
# slice 27.365711 26.839443 17.714303 26.433924 12.628934 7.899619 10
# filter 27.755171 27.225461 17.936295 26.363739 12.935709 7.969806 10
# data.table1 13.753046 16.631143 10.775278 16.330942 8.359951 5.077140 10
# data.table2 12.047111 11.944557 7.862302 11.653385 5.509432 3.642733 10
# data.table3 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
添加速度稍慢的data.table解決方案:
set.seed(123L)
d <- data.frame(
x = runif(1e8),
grp = sample(1e4, 1e8, TRUE))
setDT(d)
setorder(d, grp, -x)
dd <- copy(d)
library(microbenchmark)
microbenchmark(
data.table3 = d[, indx := seq_len(.N), grp][indx <= 5L],
data.table4 = dd[dd[, .I[seq_len(.N) <= 5L], grp]$V1],
times = 10L
)
定時(shí)輸出:
Unit: milliseconds
expr min lq mean median uq max neval
data.table3 826.2148 865.6334 950.1380 902.1689 1006.1237 1260.129 10
data.table4 729.3229 783.7000 859.2084 823.1635 966.8239 1014.397 10

TA貢獻(xiàn)1898條經(jīng)驗(yàn) 獲得超8個(gè)贊
你需要head打電話給do。在下面的代碼,.表示當(dāng)前組(見(jiàn)說(shuō)明...在do幫助頁(yè)面)。
d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
do(head(., n = 5))
如akrun所述,slice是另一種選擇。
d %>%
arrange_(~ desc(x)) %>%
group_by_(~ grp) %>%
slice(1:5)
- 3 回答
- 0 關(guān)注
- 559 瀏覽
添加回答
舉報(bào)