2 回答

TA貢獻(xiàn)1851條經(jīng)驗(yàn) 獲得超4個(gè)贊
你寫了
if (parent_go_id in parent_list):
go_dict[parent_go_id][go_id] = generate_go_tree([go_term], all_go_terms, True)
正確的是
if (parent_go_id in parent_list):
go_dict[parent_go_id][go_id] = generate_go_tree([go_term], all_go_terms, True)[go_id]
在此更改后,它會(huì)產(chǎn)生:
{
'GO:0003674': {
'GO:0003824': {},
'GO:0005198': {},
'GO:0005488': {
'GO:0005515': {},
'GO:0005549': {
'GO:0005550': {}
}
}
}
}
但我會(huì)建議完全不同的方法。創(chuàng)建一個(gè)類來(lái)解析術(shù)語(yǔ)并構(gòu)建依賴樹(shù),因?yàn)樗@樣做。
為方便起見(jiàn),我將它派生自dict,因此您可以編寫term.id而不是term['id']:
class Term(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
registry = {}
single_valued = 'id name namespace alt_id def comment synonym is_a'.split()
multi_valued = 'subset xref'.split()
def __init__(self, text):
self.children = []
self.parent = None
for line in text.splitlines():
if not ': ' in line:
continue
key, val = line.split(': ', 1)
if key in Term.single_valued:
self[key] = val
elif key in Term.multi_valued:
if not key in self:
self[key] = [val]
else:
self[key].append(val)
else:
print('unclear property: %s' % line)
if 'id' in self:
Term.registry[self.id] = self
if 'alt_id' in self:
Term.registry[self.alt_id] = self
if 'is_a' in self:
key = self.is_a.split(' ! ', 1)[0]
if key in Term.registry:
Term.registry[key].children.append(self)
self.parent = Term.registry[key]
def is_top(self):
return self.parent == None
def is_valid(self):
return self.get('is_obsolete') != 'true' and self.id != None
現(xiàn)在,您可以一口氣讀取文件:
with open('tiny_go.obo', 'rt') as f:
contents = f.read()
terms = [Term(text) for text in contents.split('\n\n')]
并且遞歸樹(shù)變得容易。例如,一個(gè)僅輸出非過(guò)時(shí)節(jié)點(diǎn)的簡(jiǎn)單“打印”函數(shù):
def print_tree(terms, indent=''):
valid_terms = [term for term in terms if term.is_valid()]
for term in valid_terms:
print(indent + 'Term %s - %s' % (term.id, term.name))
print_tree(term.children, indent + ' ')
top_terms = [term for term in terms if term.is_top()]
print_tree(top_terms)
這打?。?/p>
術(shù)語(yǔ) GO:0003674-molecular_function
術(shù)語(yǔ) GO:0003824 - 催化活性
術(shù)語(yǔ) GO:0005198 - 結(jié)構(gòu)分子活性
術(shù)語(yǔ) GO:0005488 - 綁定
術(shù)語(yǔ) GO:0005515 - 蛋白質(zhì)結(jié)合
術(shù)語(yǔ) GO:0005549 - 氣味綁定
術(shù)語(yǔ) GO:0005550 - 信息素結(jié)合
你也可以做類似的事情Term.registry['GO:0005549'].parent.name,這會(huì)得到"binding".
我將生成嵌套dicts的 GO-ID(例如在您自己的示例中)作為練習(xí),但您甚至可能不需要它,因?yàn)門erm.registry已經(jīng)與此非常相似。

TA貢獻(xiàn)2051條經(jīng)驗(yàn) 獲得超10個(gè)贊
您可以將遞歸用于更短的解決方案:
import itertools, re, json
content = list(filter(None, [i.strip('\n') for i in open('filename.txt')]))
entries = [[a, list(b)] for a, b in itertools.groupby(content, key=lambda x:x== '[Term]')]
terms = [(lambda x:x if 'is_a' not in x else {**x, 'is_a':re.findall('^GO:\d+', x['is_a'])[0]})(dict(i.split(': ', 1) for i in b)) for a, b in entries if not a]
terms = sorted(terms, key=lambda x:'is_a' in x)
def tree(d, _start):
t = [i for i in d if i.get('is_a') == _start]
return {} if not t else {i['id']:tree(d, i['id']) for i in t}
print(json.dumps({terms[0]['id']:tree(terms, terms[0]['id'])}, indent=4))
輸出:
{
"GO:0003674": {
"GO:0003824": {},
"GO:0005198": {},
"GO:0005488": {
"GO:0005515": {},
"GO:0005549": {
"GO:0005550": {}
}
}
}
}
如果父數(shù)據(jù)集未在其子數(shù)據(jù)集之前定義,這也將起作用。例如,當(dāng)父級(jí)位于其原始位置以下三個(gè)位置時(shí),仍會(huì)生成相同的結(jié)果(請(qǐng)參閱文件):
print(json.dumps({terms[0]['id']:tree(terms, terms[0]['id'])}, indent=4))
輸出:
{
"GO:0003674": {
"GO:0003824": {},
"GO:0005198": {},
"GO:0005488": {
"GO:0005515": {},
"GO:0005549": {
"GO:0005550": {}
}
}
}
}
添加回答
舉報(bào)