3 回答

TA貢獻(xiàn)1829條經(jīng)驗(yàn) 獲得超6個(gè)贊
您可以使用next或readline(在 Python 3 及更高版本中)檢索文件中的下一行:
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
outfile.write(next(infile))
outfile.write(next(infile))
或者
#note: not compatible with Python 2.7 and below
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
outfile.write(infile.readline())
outfile.write(infile.readline())
這也將導(dǎo)致文件指針前進(jìn)兩行額外的行,因此下一次迭代for line in infile:將跳過您閱讀的兩行readline。
附加術(shù)語 nitpick:文件對(duì)象不是列表,訪問列表第 x+1 個(gè)元素的方法可能不適用于訪問文件的下一行,反之亦然。如果您確實(shí)想訪問正確列表對(duì)象的下一項(xiàng),則可以使用enumerate它來對(duì)列表的索引執(zhí)行算術(shù)運(yùn)算。例如:
seq = ["foo", "bar", "baz", "qux", "troz", "zort"]
#find all instances of "baz" and also the first two elements after "baz"
for idx, item in enumerate(seq):
if item == "baz":
print(item)
print(seq[idx+1])
print(seq[idx+2])
請(qǐng)注意,與 不同readline,索引不會(huì)推進(jìn)迭代器,因此for idx, item in enumerate(seq):仍會(huì)迭代“qux”和“troz”。
適用于任何迭代的方法是使用附加變量來跟蹤迭代中的狀態(tài)。這樣做的好處是您不必了解如何手動(dòng)推進(jìn)迭代;缺點(diǎn)是對(duì)循環(huán)內(nèi)的邏輯進(jìn)行推理比較困難,因?yàn)樗┞读祟~外的副作用。
first = "Start"
first_end = "---"
with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
copy = False
num_items_to_write = 0
for line in infile:
if num_items_to_write > 0:
outfile.write(line)
num_items_to_write -= 1
elif line.strip().startswith(first):
copy = True
outfile.write(line)
elif line.strip().startswith(first_end):
copy = False
outfile.write(line)
num_items_to_write = 2
elif copy:
outfile.write(line)
在從分隔文件中提取重復(fù)數(shù)據(jù)組的特定情況下,完全跳過迭代并使用正則表達(dá)式可能是合適的。對(duì)于像您這樣的數(shù)據(jù),可能如下所示:
import re
with open("testlog.log") as file:
data = file.read()
pattern = re.compile(r"""
^Start$ #"Start" by itself on a line
(?:\n.*$)*? #zero or more lines, matched non-greedily
#use (?:) for all groups so `findall` doesn't capture them later
\n---$ #"---" by itself on a line
(?:\n.*$){2} #exactly two lines
""", re.MULTILINE | re.VERBOSE)
#equivalent one-line regex:
#pattern = re.compile("^Start$(?:\n.*$)*?\n---$(?:\n.*$){2}", re.MULTILINE)
for group in pattern.findall(data):
print("Found group:")
print(group)
print("End of group.\n\n")
在日志上運(yùn)行時(shí),如下所示:
Start
foo
bar
baz
qux
---
troz
zort
alice
bob
carol
dave
Start
Fred
Barney
---
Wilma
Betty
Pebbles
...這將產(chǎn)生輸出:
Found group:
Start
foo
bar
baz
qux
---
troz
zort
End of group.
Found group:
Start
Fred
Barney
---
Wilma
Betty
End of group.

TA貢獻(xiàn)1880條經(jīng)驗(yàn) 獲得超4個(gè)贊
最簡(jiǎn)單的方法是制作一個(gè)解析 infile 的生成器函數(shù):
def read_file(file_handle, start_line, end_line, extra_lines=2):
start = False
while True:
try:
line = next(file_handle)
except StopIteration:
return
if not start and line.strip().startswith(start_line):
start = True
yield line
elif not start:
continue
elif line.strip().startswith(end_line):
yield line
try:
for _ in range(extra_lines):
yield next(file_handle)
except StopIteration:
return
else:
yield line
try-except如果您知道每個(gè)文件都是格式良好的,則不需要這些子句。
你可以像這樣使用這個(gè)生成器:
if __name__ == "__main__":
first = "Start"
first_end = "---"
with open("testlog.log") as infile, open("parsed.txt", "a") as outfile:
output = read_file(
file_handle=infile,
start_line=first,
end_line=first_end,
extra_lines=1,
)
outfile.writelines(output)

TA貢獻(xiàn)1842條經(jīng)驗(yàn) 獲得超13個(gè)贊
具有三態(tài)變量和更少的代碼重復(fù)。
first = "Start"
first_end = "---"
# Lines to read after end flag
extra_count = 2
with open('testlog.log') as infile, open('parsed.txt', 'a') as outfile:
# Do no copy by default
copy = 0
for line in infile:
# Strip once only
clean_line = line.strip()
# Enter "infinite copy" state
if clean_line.startswith(first):
copy = -1
# Copy next line and extra amount
elif clean_line.startswith(first_end):
copy = extra_count + 1
# If in a "must-copy" state
if copy != 0:
# One less line to copy if end flag passed
if copy > 0:
copy -= 1
# Copy current line
outfile.write(line)
添加回答
舉報(bào)