2 回答

TA貢獻(xiàn)1856條經(jīng)驗(yàn) 獲得超17個(gè)贊
這是一個(gè)基本的 R 答案。
第一match列Commenid與Parentid. 創(chuàng)建一個(gè)數(shù)據(jù)集,其中Author列和Reply作者的列之前匹配。保留所有沒(méi)有NA值的行,并將 ( merge) 與原始數(shù)據(jù)連接起來(lái)以獲得其他列。
i <- with(df1, match(Commenid, Parentid))
res <- data.frame(Author = df1$Author, Reply = df1$Author[i])
res <- res[complete.cases(res), ]
merge(res, df1)
# Author Reply Commenid Parentid Submissionid
#1 User1 User2 333c 222b 111b
#2 User3 User1 222b 555d 23er
#3 User4 User3 555d 666f 111b
一種dplyr解決方案可能是
library(dplyr)
df1 %>%
mutate(i = match(Commenid, Parentid),
Reply = Author[i]) %>%
filter(!is.na(i)) %>%
select(Author, Reply, everything(vars = -i))
數(shù)據(jù)
df1 <- read.csv(text = "
Author,Commenid,Parentid,Submissionid
User1 , 333c , 222b , 111b
User2 , 444c , 333c , 5hdc
User3 , 222b , 555d , 23er
User4 , 555d , 666f , 111b
")
df1[] <- lapply(df1, trimws)
編輯
有了評(píng)論中描述的新數(shù)據(jù)和問(wèn)題,這里有一個(gè)dplyr解決方案。在與上面基本相同之后,它將結(jié)果與原始數(shù)據(jù)集連接起來(lái)并對(duì)列重新排序。
library(dplyr)
df2 %>%
mutate(i = match(Commenid, Parentid),
Reply = Author[i]) %>%
filter(!is.na(i)) %>%
select(-i) %>%
select(Author, Score, Stance, Reply, everything()) %>%
left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>%
select(-matches("id$"), everything(), matches("id$"))
新數(shù)據(jù)
df2 <- read.csv(text = "
Author,Commenid,Parentid,Submissionid, Score, Stance
User1 , 333c , 222b , 111b , 10 , Positive
User2 , 444c , 333c , 5hdc , 15 , Neutral
User3 , 222b , 555d , 23er , 20 , Negative
User4 , 555d , 666f , 111b , 11 , Positive
")
names(df1) <- trimws(names(df1))
df1[] <- lapply(df1, trimws)

TA貢獻(xiàn)1719條經(jīng)驗(yàn) 獲得超6個(gè)贊
您可以將每個(gè)用戶與其他用戶進(jìn)行比較,如果commentid相等parentid則您可以打印它,下面是您如何在 Python 中執(zhí)行此操作:
for u1 in dataset :
for u2 in dataset :
if u1['parentid'] == u2['commentid'] :
print( u1['Author'],' had comment of ',u2['Author'] )
添加回答
舉報(bào)