首頁(yè) 猿問(wèn) 如何從平面文件（Gene...

如何從平面文件（Gene Ontology OBO 文件）生成遞歸樹(shù)狀字典？

Python

慕無(wú)忌1623718 2021-12-08 10:32:33

我正在嘗試編寫代碼來(lái)解析 Gene Ontology (GO) OBO 文件并將 go 術(shù)語(yǔ) ID（例如 GO:0003824）推送到樹(shù)狀嵌套字典中。OBO 文件中的層次結(jié)構(gòu)用“is_a”標(biāo)識(shí)符表示，用于標(biāo)記每個(gè) GO 術(shù)語(yǔ)的每個(gè)父級(jí)。一個(gè) GO 術(shù)語(yǔ)可能有多個(gè)父級(jí)，而層次結(jié)構(gòu)中最高的 Go 術(shù)語(yǔ)沒(méi)有父級(jí)。GO OBO 文件的一個(gè)小例子如下所示：[Term]id: GO:0003674name: molecular_functionnamespace: molecular_functionalt_id: GO:0005554def: "A molecular process that can be carried out by the action of a single macromolecular machine, usually via direct physical interactions with other molecular entities. Function in this sense denotes an action, or activity, that a gene product (or a complex) performs. These actions are described from two distinct but related perspectives: (1) biochemical activity, and (2) role as a component in a larger system/process." [GOC:pdt]comment: Note that, in addition to forming the root of the molecular function ontology, this term is recommended for use for the annotation of gene products whose molecular function is unknown. When this term is used for annotation, it indicates that no information was available about the molecular function of the gene product annotated as of the date the annotation was made; the evidence code "no data" (ND), is used to indicate this. Despite its name, this is not a type of 'function' in the sense typically defined by upper ontologies such as Basic Formal Ontology (BFO). It is instead a BFO:process carried out by a single gene product or complex.subset: goslim_aspergillussubset: goslim_candidasubset: goslim_chemblsubset: goslim_genericsubset: goslim_metagenomicssubset: goslim_pirsubset: goslim_plantsubset: goslim_yeastsynonym: "molecular function" EXACT []

查看完整描述

2 回答

繁花不似錦

TA貢獻(xiàn)1851條經(jīng)驗(yàn) 獲得超4個(gè)贊

你寫了

if (parent_go_id in parent_list):

go_dict[parent_go_id][go_id] = generate_go_tree([go_term], all_go_terms, True)

正確的是

if (parent_go_id in parent_list):

go_dict[parent_go_id][go_id] = generate_go_tree([go_term], all_go_terms, True)[go_id]

在此更改后，它會(huì)產(chǎn)生：

{

'GO:0003674': {

'GO:0003824': {},

'GO:0005198': {},

'GO:0005488': {

'GO:0005515': {},

'GO:0005549': {

'GO:0005550': {}

}

但我會(huì)建議完全不同的方法。創(chuàng)建一個(gè)類來(lái)解析術(shù)語(yǔ)并構(gòu)建依賴樹(shù)，因?yàn)樗@樣做。

為方便起見(jiàn)，我將它派生自dict，因此您可以編寫term.id而不是term['id']：

class Term(dict):

__getattr__ = dict.__getitem__

__setattr__ = dict.__setitem__

__delattr__ = dict.__delitem__

registry = {}

single_valued = 'id name namespace alt_id def comment synonym is_a'.split()

multi_valued = 'subset xref'.split()

def __init__(self, text):

self.children = []

self.parent = None

for line in text.splitlines():

if not ': ' in line:

continue

key, val = line.split(': ', 1)

if key in Term.single_valued:

self[key] = val

elif key in Term.multi_valued:

if not key in self:

self[key] = [val]

else:

self[key].append(val)

else:

print('unclear property: %s' % line)

if 'id' in self:

Term.registry[self.id] = self

if 'alt_id' in self:

Term.registry[self.alt_id] = self

if 'is_a' in self:

key = self.is_a.split(' ! ', 1)[0]

if key in Term.registry:

Term.registry[key].children.append(self)

self.parent = Term.registry[key]

def is_top(self):

return self.parent == None

def is_valid(self):

return self.get('is_obsolete') != 'true' and self.id != None

現(xiàn)在，您可以一口氣讀取文件：

with open('tiny_go.obo', 'rt') as f:

contents = f.read()

terms = [Term(text) for text in contents.split('\n\n')]

并且遞歸樹(shù)變得容易。例如，一個(gè)僅輸出非過(guò)時(shí)節(jié)點(diǎn)的簡(jiǎn)單“打印”函數(shù)：

def print_tree(terms, indent=''):

valid_terms = [term for term in terms if term.is_valid()]

for term in valid_terms:

print(indent + 'Term %s - %s' % (term.id, term.name))

print_tree(term.children, indent + ' ')

top_terms = [term for term in terms if term.is_top()]

print_tree(top_terms)

這打?。?/p>

術(shù)語(yǔ) GO:0003674-molecular_function

術(shù)語(yǔ) GO:0003824 - 催化活性

術(shù)語(yǔ) GO:0005198 - 結(jié)構(gòu)分子活性

術(shù)語(yǔ) GO:0005488 - 綁定

術(shù)語(yǔ) GO:0005515 - 蛋白質(zhì)結(jié)合

術(shù)語(yǔ) GO:0005549 - 氣味綁定

術(shù)語(yǔ) GO:0005550 - 信息素結(jié)合

你也可以做類似的事情Term.registry['GO:0005549'].parent.name，這會(huì)得到"binding".

我將生成嵌套dicts的 GO-ID（例如在您自己的示例中）作為練習(xí)，但您甚至可能不需要它，因?yàn)門erm.registry已經(jīng)與此非常相似。

反對(duì) 回復(fù) 2021-12-08

侃侃無(wú)極

TA貢獻(xiàn)2051條經(jīng)驗(yàn) 獲得超10個(gè)贊

您可以將遞歸用于更短的解決方案：

import itertools, re, json

content = list(filter(None, [i.strip('\n') for i in open('filename.txt')]))

entries = [[a, list(b)] for a, b in itertools.groupby(content, key=lambda x:x== '[Term]')]

terms = [(lambda x:x if 'is_a' not in x else {**x, 'is_a':re.findall('^GO:\d+', x['is_a'])[0]})(dict(i.split(': ', 1) for i in b)) for a, b in entries if not a]

terms = sorted(terms, key=lambda x:'is_a' in x)

def tree(d, _start):

t = [i for i in d if i.get('is_a') == _start]

return {} if not t else {i['id']:tree(d, i['id']) for i in t}

print(json.dumps({terms[0]['id']:tree(terms, terms[0]['id'])}, indent=4))

輸出：

{

"GO:0003674": {

"GO:0003824": {},

"GO:0005198": {},

"GO:0005488": {

"GO:0005515": {},

"GO:0005549": {

"GO:0005550": {}

}

如果父數(shù)據(jù)集未在其子數(shù)據(jù)集之前定義，這也將起作用。例如，當(dāng)父級(jí)位于其原始位置以下三個(gè)位置時(shí)，仍會(huì)生成相同的結(jié)果（請(qǐng)參閱文件）：

print(json.dumps({terms[0]['id']:tree(terms, terms[0]['id'])}, indent=4))

輸出：

{

"GO:0003674": {

"GO:0003824": {},

"GO:0005198": {},

"GO:0005488": {

"GO:0005515": {},

"GO:0005549": {

"GO:0005550": {}

}

反對(duì) 回復(fù) 2021-12-08

2 回答
0 關(guān)注
407 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何從平面文件（Gene Ontology OBO 文件）生成遞歸樹(shù)狀字典？

如何從平面文件（Gene Ontology OBO 文件）生成遞歸樹(shù)狀字典？

2 回答

添加回答

如何從平面文件（Gene Ontology OBO 文件）生成遞歸樹(shù)狀字典？