3 回答

TA貢獻1866條經(jīng)驗 獲得超5個贊
一個plyr解決方案(tmp是您的數(shù)據(jù)框):
library("plyr")
ddply(tmp, .(id), function(x) x[c(1, nrow(x)), ])
# id d gr mm area
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3
或使用dplyr(另請參見此處):
library("dplyr")
tmp %>%
group_by(id) %>%
slice(c(1, n())) %>%
ungroup()
# # A tibble: 10 × 5
# id d gr mm area
# <int> <int> <int> <dbl> <int>
# 1 15 1 2 3.4 1
# 2 15 1 1 5.5 2
# 3 21 1 1 4.0 2
# 4 21 1 2 3.8 2
# 5 22 1 1 4.0 2
# 6 22 1 2 4.6 2
# 7 23 1 1 2.7 2
# 8 23 1 2 3.0 2
# 9 24 1 1 3.0 2
# 10 24 1 2 2.0 3

TA貢獻1784條經(jīng)驗 獲得超9個贊
這是base中的解決方案R。如果有多個相同的組,則id此代碼返回每個單獨組的第一行和最后一行。
該解決方案可能比下面的其他答案更直觀:
lmy.df = read.table(text = '
id d gr mm area
15 1 2 3.40 1
15 1 1 4.90 2
15 1 1 4.40 1
15 1 1 5.50 2
21 1 1 4.00 2
21 1 2 3.80 2
22 1 1 4.00 2
23 1 1 2.70 2
23 1 1 4.00 2
23 1 2 3.00 2
24 1 1 3.00 2
24 1 1 2.00 3
24 1 1 4.00 2
24 1 2 2.00 3
', header = TRUE)
head <- aggregate(lmy.df, by=list(lmy.df$id), FUN = function(x) { first = head(x,1) } )
tail <- aggregate(lmy.df, by=list(lmy.df$id), FUN = function(x) { last = tail(x,1) } )
head$order = 'first'
tail$order = 'last'
my.output <- rbind(head, tail)
my.output
# Group.1 id d gr mm area order
#1 15 15 1 2 3.4 1 first
#2 21 21 1 1 4.0 2 first
#3 22 22 1 1 4.0 2 first
#4 23 23 1 1 2.7 2 first
#5 24 24 1 1 3.0 2 first
#6 15 15 1 1 5.5 2 last
#7 21 21 1 2 3.8 2 last
#8 22 22 1 1 4.0 2 last
#9 23 23 1 2 3.0 2 last
#10 24 24 1 2 2.0 3 last
自發(fā)布我的原始答案以來,我已經(jīng)知道使用它lapply比更好apply。這是因為apply如果每個組具有相同的行數(shù),則不起作用。請參閱此處:按組編號行時出錯
lmy.df = read.table(text = '
id d gr mm area
15 1 2 3.40 1
15 1 1 4.90 2
15 1 1 4.40 1
15 1 1 5.50 2
21 1 1 4.00 2
21 1 2 3.80 2
22 1 1 4.00 2
23 1 1 2.70 2
23 1 1 4.00 2
23 1 2 3.00 2
24 1 1 3.00 2
24 1 1 2.00 3
24 1 1 4.00 2
24 1 2 2.00 3
', header = TRUE)
lmy.seq <- rle(lmy.df$id)$lengths
lmy.df$first <- unlist(lapply(lmy.seq, function(x) seq(1,x)))
lmy.df$last <- unlist(lapply(lmy.seq, function(x) seq(x,1,-1)))
lmy.df
lmy.df2 <- lmy.df[lmy.df$first==1 | lmy.df$last == 1,]
lmy.df2
# id d gr mm area first last
#1 15 1 2 3.4 1 1 4
#4 15 1 1 5.5 2 4 1
#5 21 1 1 4.0 2 1 2
#6 21 1 2 3.8 2 2 1
#7 22 1 1 4.0 2 1 1
#8 23 1 1 2.7 2 1 3
#10 23 1 2 3.0 2 3 1
#11 24 1 1 3.0 2 1 4
#14 24 1 2 2.0 3 4 1
這是一個示例,其中每個組都有兩行:
lmy.df = read.table(text = '
id d gr mm area
15 1 2 3.40 1
15 1 1 4.90 2
21 1 1 4.00 2
21 1 2 3.80 2
22 1 1 4.00 2
22 1 1 6.00 2
23 1 1 2.70 2
23 1 2 3.00 2
24 1 1 3.00 2
24 1 2 2.00 3
', header = TRUE)
lmy.seq <- rle(lmy.df$id)$lengths
lmy.df$first <- unlist(lapply(lmy.seq, function(x) seq(1,x)))
lmy.df$last <- unlist(lapply(lmy.seq, function(x) seq(x,1,-1)))
lmy.df
lmy.df2 <- lmy.df[lmy.df$first==1 | lmy.df$last == 1,]
lmy.df2
# id d gr mm area first last
#1 15 1 2 3.4 1 1 2
#2 15 1 1 4.9 2 2 1
#3 21 1 1 4.0 2 1 2
#4 21 1 2 3.8 2 2 1
#5 22 1 1 4.0 2 1 2
#6 22 1 1 6.0 2 2 1
#7 23 1 1 2.7 2 1 2
#8 23 1 2 3.0 2 2 1
#9 24 1 1 3.0 2 1 2
#10 24 1 2 2.0 3 2 1
原始答案:
my.seq <- data.frame(rle(my.df$id)$lengths)
my.df$first <- unlist(apply(my.seq, 1, function(x) seq(1,x)))
my.df$last <- unlist(apply(my.seq, 1, function(x) seq(x,1,-1)))
my.df2 <- my.df[my.df$first==1 | my.df$last == 1,]
my.df2
id d gr mm area first last
1 15 1 2 3.4 1 1 4
4 15 1 1 5.5 2 4 1
5 21 1 1 4.0 2 1 2
6 21 1 2 3.8 2 2 1
7 22 1 1 4.0 2 1 3
9 22 1 2 4.6 2 3 1
10 23 1 1 2.7 2 1 3
12 23 1 2 3.0 2 3 1
13 24 1 1 3.0 2 1 4
16 24 1 2 2.0 3 4 1
- 3 回答
- 0 關注
- 1421 瀏覽
添加回答
舉報