import re
s = '18year old 23 year old 99 years old but not 25-year-old and 91year old cousin is 99 now and 90-year-old or 102 year old'從中s,我想使用正則表達式提取所有 90 歲或以上的年齡。例如,99 years old將被提取但不是18year old.我試過以下 reg = r'(9\d|\d{3,})(-year-old)|(9\d|\d{3,})( year old)'
r1 = re.findall(reg,s)
r1這給了我[('90', '-year-old', '', ''), ('', '', '102', ' year old')]理想情況下,我想要這個輸出['99 years old', '91year old', '90-year-old', '102 year old' ]如何更改我的正則表達式reg以獲得我想要的輸出?
1 回答

皈依舞
TA貢獻1851條經(jīng)驗 獲得超3個贊
這個正則表達式會做你想做的事:
(?:9\d|1\d{2})(?:\s|-)?years?(?:\s|-)?old
正則表達式演示
解釋:
(?:9\d|1\d{2}) # Non-capturing group - match 9x or 1xx
(?:\s|-)? # Non-capturing group - optionally match whitespace or -
years? # Match year and optionally s
(?:\s|-)? # Non-capturing group - optionally match whitespace or -
old # Match old
代碼片段:
reg = r'(?:9\d|1\d{2})(?:\s|-)?years?(?:\s|-)?old'
r1 = re.findall(reg,s)
print(r1)
# ['99 years old', '91year old', '90-year-old', '102 year old']
添加回答
舉報
0/150
提交
取消