3 回答
TA貢獻(xiàn)1921條經(jīng)驗(yàn) 獲得超9個(gè)贊
re解決方案:
import re
input = [
"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",
"[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",
"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",
]
def extract(s):
match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)
return match.groups()
output = [extract(item) for item in input]
print(output)
輸出:
[
('X=250.44', 'DECEMBER 31,'),
('X=307.5', 'respectively. The net decrease in the revenue'),
('X=49.5', '(US$ in millions)'),
]
解釋:
\d... 數(shù)字\d+...一位或多位數(shù)字(?:...)...非捕獲(“正?!保├ㄌ?hào)\.\d*... 點(diǎn)后跟零個(gè)或多個(gè)數(shù)字(?:\.\d*)?...可選(零或一)“小數(shù)部分”(X=\d+(?:\.\d*)?)...第一組,X=number.*?...零個(gè)或多個(gè)任何字符(非貪婪)\]...]符號(hào)$... 字符串結(jié)尾\](.*?)$...第二組,]字符串之間和結(jié)尾之間的任何內(nèi)容
TA貢獻(xiàn)1827條經(jīng)驗(yàn) 獲得超8個(gè)贊
嘗試這個(gè):
(X=[^,]*)(?:.*])(.*)
import re
source = """[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,
[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue
[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)""".split('\n')
pattern = r"(X=[^,]*)(?:.*])(.*)"
for line in source:
print(re.search(pattern, line).groups())
輸出:
('X=250.44', 'DECEMBER 31,')
('X=307.5', 'respectively. The net decrease in the revenue')
('X=49.5', '(US$ in millions)')
您X=在所有捕獲前面,所以我只做了一個(gè)捕獲組,如果重要的話,請(qǐng)隨意添加非捕獲組。
TA貢獻(xiàn)1868條經(jīng)驗(yàn) 獲得超4個(gè)贊
使用帶有命名組的正則表達(dá)式來(lái)捕獲相關(guān)位:
>>> line = "[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,"
>>> m = re.search(r'(?:\(X=)(?P<x_coord>.*?)(?:,.*])(?P<text>.*)$', line)
>>> m.groups()
('250.44', 'DECEMBER 31,')
>>> m['x_coord']
'250.44'
>>> m['text']
'DECEMBER 31,'
添加回答
舉報(bào)
