1 回答

TA貢獻(xiàn)1790條經(jīng)驗(yàn) 獲得超9個(gè)贊
PMID您可以使用as 鍵和AUTHORs 作為值來(lái)收集字典中的數(shù)據(jù)。
假設(shè)您從文件開(kāi)始
from io import StringIO
fo = StringIO(
'''PMID- 12345678
xyz - text (might be multiple lines)
xyz- text (might be multiple lines)
AUTHOR- author1
AUTHOR- author2
PMID- 12345679
xyz - text (might be multiple lines)
xyz- text (might be multiple lines)
AUTHOR- author3
AUTHOR- author4''')
# with open(filename, 'r') as fo:
然后迭代行并填充字典
records = dict()
pmid = None
for line in fo.readlines():
if line.startswith('PMID-'):
pmid = line.split('-')[-1].strip()
records[pmid] = []
elif line.startswith('AUTHOR'):
records[pmid].append(line.split('-')[-1].strip())
創(chuàng)建數(shù)據(jù)框時(shí),您可以將df = pd.DataFrame(records)每個(gè)作者放在一列中或在傳遞給數(shù)據(jù)框構(gòu)造函數(shù)之前加入列表
df = pd.DataFrame(
[', '.join(r) for r in records.values()],
index=records.keys()
)
輸出
0
12345678 author1, author2
12345679 author3, author4
添加回答
舉報(bào)