我有一個有很多條目的DF。DF 的摘錄如下所示。DF_OLD =...sID tID NER token Prediction274 79 U-Peop khrushchev Live_In-ARG2+B274 79 O 's Live_IN-ARG2+L807 53 U-Loc louisiana Live_IN-ARG2+U807 56 B-Peop earl Live_IN-ARG1+B807 57 L-Peop long Live_IN-ARG1+L807 13 B-Peop dwight Live_IN-ARG1+B807 13 I-Peop d. Live_IN-ARG1+I807 13 L-Peop eisenhower Live_IN-ARG1+L...該列sID將不同的句子分開。該列Prediction顯示了機器學習分類器的結果。這些可能很荒謬。我的目標是按照以下方案將所有預測的標簽分組:DF_Expected =...sID entity1 tID1 entity2 tID2 Relation274 NaN NaN khrushchev 's 79 Live_In 807 earl long 56 57 louisiana 53 Live_In807 dwight d. eisenhower 13 louisiana 53 Live_In...“-ARGX-”部分顯示實體在表中的位置,而第一個“-”之前的部分顯示關系。如果缺少參數(shù)部分之一,則相應的單元格應為空。這是我嘗試過的:DF["Live_In_Predict_Split"] = DF["Prediction"].str.split("+").str[0]DF["token2"] = DF["token"]DF["tokenID2"] = DF["tokenID"]DF["Live_In_Predict2"] = DF["Live_In_Predict"]data_tokeni_map = DF.groupby(["Live_In_Predict_Split","sentenceID"],as_index=True, sort=False).agg(" ".join).reset_index()s = data_tokeni_map.loc[:,['sentenceID','token2',"tokenID2","Live_In_Predict2"]].merge(data_tokeni_map.loc[:,['sentenceID','token',"tokenID","Live_In_Predict"]],on='sentenceID') s = s.loc[s.token2!=s.token].drop_duplicates()我缺少某種計數(shù)器來區(qū)分不同的“-ARGX-”和某種 GroupBy 函數(shù)(GroupingBy tokenID 不智能,因為它會產(chǎn)生錯誤的結果)。因此,我的新DF錯誤:DF_EDITED =...sID entity1 tID1 entity2 tID2 ...807 dwight d eisenhower earl long 13 56 57 louisiana 53 807 louisiana 13 56 57 dwight d eisenhower earl long 53
添加回答
舉報
0/150
提交
取消