1 回答

TA貢獻(xiàn)1858條經(jīng)驗(yàn) 獲得超8個(gè)贊
模式片是一個(gè)好的開始,但是你必須將它與原始數(shù)據(jù)框合并/加入:
df.index.name = "inx"
pattern = re.compile (r'(\[[\w ]+\]\.\[[\w ]+\])')
# extract the attributes.
extracts = df.MDX_TEXT.str.extractall(pattern).rename(columns={0:"attrname"})
# join the result with the original dataframe.
res = df.join(extracts).reset_index()[["ID", "USER", "attrname"]].drop_duplicates()
# take just the last part of each attribute name.
res["attrname"] = res["attrname"].str.split(".", expand = True).iloc[:, -1]
結(jié)果是:
ID USER attrname
0 1 JOE [ATTR1]
1 1 JOE [ATTR2]
2 1 JOE [ATTR3]
3 2 JAY [ATTR1]
4 2 JAY [ATTR3]
添加回答
舉報(bào)