1 回答

TA貢獻(xiàn)1797條經(jīng)驗(yàn) 獲得超6個(gè)贊
你的代碼是錯(cuò)誤的。考慮ID2_1234_CAT_ANIMAL_GOOD_3在內(nèi)循環(huán)中檢查行時(shí)會(huì)發(fā)生什么:
subject_id = cat.split('_')[0] #ID2
num_id = cat.split('_')[1] # 1234
subject_num = subject_id + '_' + num_id #ID2_1234
for j, dog in enumerate(files_list):
# when dog is the line ID2_1234_CAT_ANIMAL_GOOD_3
if subject_num in dog and 'GOOD' in dog: # this is true
if 'GOOD' in dog and 'DOG' in dog: # this is false
continue;
else:
file_filter.append(cat) # then it outputs it
問題是,每行GOOD,并CAT在將“匹配本身”的內(nèi)循環(huán)。
恕我直言,我會(huì)使用itertools.groupby。類似的東西:
from itertools import groupby
def key(line):
return line.split('_')[:2]
for key, lines in groupby(sorted(files_list, key=key), key=key):
good_lines = [line for line in lines if 'GOOD' in line]
if len(good_lines) == 1 and 'CAT' in good_lines[0]:
file_filter.append(good_lines[0])
這也應(yīng)該是更有效的 O(nlog n) 與 O(n^2) 相比,盡管它需要 RAM 中文件的所有內(nèi)容。
如果您有除CATand以外的其他“類” ,DOG并且您想輸出GOOD CAT除subject_idis之外的所有行,GOOD DOG您可以通過以下方式修改上面的代碼:
is_good_cat = any('CAT' in line for line in good_lines)
is_good_dog = any('DOG' in line for line in good_lines)
if is_good_cat and not is_good_dog:
file_filter.extend(line for line in good_lines if 'CAT' in good_lines)
(你需要使用.extend和循環(huán),因?yàn)槲覀儾辉僦酪獙懩囊恍?,所以你必須過濾它們。
添加回答
舉報(bào)