2 回答

TA貢獻1795條經(jīng)驗 獲得超7個贊
我認(rèn)為您可以通過簡單的檢查來完成所需的工作。讓我解釋一下我是否正確理解。你可以有一個標(biāo)志(真/假值)來檢測你是否在有趣的塊中。每當(dāng)您找到“###PERFORMANCE”時,您都可以更改此標(biāo)志。然后您可以將這兩個塊保存在兩個列表或您喜歡的任何結(jié)構(gòu)中。
下面是代碼片段
logFile = "logfile.txt"
with open(logFile) as f:
content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]
# flag
are_we_in_the_interesting_block = False;
# two lists to save the liens
interesting_block = [];
non_interesting_block = [];
for line in content:
# check if there is the text ###PERFORMANCE
is_there_performance = line.find('###PERFORMANCE');
# if it's not there, it returns -1
if is_there_performance > 0:
are_we_in_the_interesting_block = not are_we_in_the_interesting_block;
else:
if are_we_in_the_interesting_block:
# here I append to a list, but you can do your processing
interesting_block.append(line);
else:
# here processing of the non interesting parts
non_interesting_block.append(line);
print('Interesting blocks')
print(interesting_block)
print('\n')
print('Non interesting blocks')
print(non_interesting_block)
產(chǎn)生的輸出將是
Interesting blocks
['20190122 09:10,500 number1 string1 string2 string3', '20190122 09:24,670 number2 string1 string2 string3', '20190122 10:05,000 number3 string1 string2 string3', '20190122 10:33,960 number4 string1 string2 string3', '20190122 11:00,321 number5 string1 string2 string3', '20190123 08:10,500 number1 string1 string2 string3', '20190123 08:24,670 number2 string1 string2 string3', '20190123 09:05,000 number3 string1 string2 string3', '20190123 10:33,960 number4 string1 string2 string3', '20190123 10:00,321 number5 string1 string2 string3', '20190124 10:10,500 number1 string1 string2 string3', '20190124 10:24,670 number2 string1 string2 string3', '20190124 11:05,000 number3 string1 string2 string3', '20190124 12:33,960 number4 string1 string2 string3', '20190124 13:00,321 number5 string1 string2 string3']
Non interesting blocks
['20190123 10:24,670 number1 string1 string2 string3 string4 date1 number2', '20190123 10:32,130 number1 string1 string2 string3 string4 date1 number2']
然后,interesting_block[n]如果需要,您可以訪問以獲取第 n 行。

TA貢獻1802條經(jīng)驗 獲得超5個贊
由于您只是在 PERFORMANCE 分隔符上進行匹配,因此使用 NLTK 似乎有點過分。一個簡單的方法是使用一個簡單的匹配(是行中的預(yù)期字符串),然后根據(jù)它切換您的捕獲模式。例如:
in_block = False
IDENTIFIER = 'PERFORMANCE'
with open(logfile) as f:
for line in f.readlines():
if IDENTIFIER in line:
# Toggle the boolean
in_block = not in_block
if in_block:
print(line)
添加回答
舉報