1 回答

TA貢獻(xiàn)1825條經(jīng)驗(yàn) 獲得超6個(gè)贊
您的外觀與您發(fā)布的數(shù)據(jù)框略有不同:structure
> df
Subject Recipient Length Folder Message Date Edit
1 80 out NA 1/2/2020 1:00:01 AM TRUE
2 80 out NA 1/2/2020 1:00:05 AM TRUE
3 hey sarah@mail.com,gee@mail.com 80 out NA 1/2/2020 1:00:10 AM TRUE
4 hey sarah@mail.com,gee@mail.com 80 out NA 1/2/2020 1:00:15 AM TRUE
5 hey sarah@mail.com,gee@mail.com 80 out NA 1/2/2020 1:00:30 AM TRUE
6 NA NA NA
7 NA NA NA
8 hey sarah@mail.com,gee@mail.com 80 draft NA 1/2/2020 1:02:00 AM TRUE
9 hey sarah@mail.com,gee@mail.com 80 draft NA 1/2/2020 1:02:05 AM TRUE
10 NA NA NA
11 NA NA NA
12 hey sarah@mail.com,gee@mail.com 100 draft NA 1/2/2020 1:03:00 AM TRUE
13 hey sarah@mail.com,gee@mail.com 100 draft NA 1/2/2020 1:03:20 AM TRUE
此外,您所需的輸出表明您希望按其他類別拆分組,但這不是您的描述所說(shuō)的,因此我沒(méi)有按 分組。不過(guò),如果您愿意,這很容易改變。FolderFolder
您可以使用運(yùn)行長(zhǎng)度編碼來(lái)消除排序數(shù)據(jù)中相同連續(xù)值的組的歧義,但在 R 中,轉(zhuǎn)換為數(shù)據(jù)框列有點(diǎn)棘手。我用這個(gè)答案來(lái)實(shí)現(xiàn)這一點(diǎn)。rle
library(lubridate)
library(dplyr)
df %>%
mutate(Date = mdy_hms(Date),
Key = paste(Subject, Recipient, Length, sep = "_")) %>%
arrange(Date) %>%
filter(Folder == "out" | Folder == "draft" & Edit == TRUE) %>%
mutate(RLE = {RLE = rle(Key) ; rep(seq_along(RLE$lengths), RLE$lengths)}) %>%
group_by(RLE) %>%
summarize(Start = first(Date),
End = last(Date),
Duration = as.numeric(End) - as.numeric(Start))
這將從第 1:2 行、3:5+8:9 和 12:13 行創(chuàng)建組。這些組給出以下持續(xù)時(shí)間:
# A tibble: 3 x 4
RLE Start End Duration
<int> <dttm> <dttm> <dbl>
1 1 2020-01-02 01:00:01 2020-01-02 01:00:05 4
2 2 2020-01-02 01:00:10 2020-01-02 01:02:05 115
3 3 2020-01-02 01:03:00 2020-01-02 01:03:20 20
如果要包含在分組中,請(qǐng)將其添加到創(chuàng)建 中包含的內(nèi)容中。這使得小組1:2,3:5,8:9和12:13。這樣做會(huì)得到這樣的結(jié)果:FolderKey
# A tibble: 4 x 4
RLE Start End Duration
<int> <dttm> <dttm> <dbl>
1 1 2020-01-02 01:00:01 2020-01-02 01:00:05 4
2 2 2020-01-02 01:00:10 2020-01-02 01:00:30 20
3 3 2020-01-02 01:02:00 2020-01-02 01:02:05 5
4 4 2020-01-02 01:03:00 2020-01-02 01:03:20 20
添加回答
舉報(bào)