首頁猿問如何忽略正則表達式中不需要的模式

如何忽略正則表達式中不需要的模式

Python

守著一只汪 2023-05-09 10:33:55

我有以下 python 代碼from io import BytesIOimport pdfplumber, requeststest_case = { 'https://www1.hkexnews.hk/listedco/listconews/sehk/2020/0514/2020051400555.pdf': 59, 'https://www1.hkexnews.hk/listedco/listconews/gem/2020/0529/2020052902118.pdf': 55, 'https://www1.hkexnews.hk/listedco/listconews/sehk/2020/0618/2020061800366.pdf': 47, 'https://www1.hkexnews.hk/listedco/listconews/gem/2020/0630/2020063002674.pdf': 30,}for url, page in test_case.items(): rq = requests.get(url) pdf = pdfplumber.load(BytesIO(rq.content)) txt = pdf.pages[page].extract_text() txt = re.sub("([^\x00-\x7F])+", "", txt) # no chinese pattern = r'.*\n.*?(?P<auditor>[A-Z].+?\n?)(?:LLP\s*)?\s*((PRC.*?|Chinese.*?)?[Cc]ertified [Pp]ublic|[Cc]hartered) [Aa]ccountants' try: auditor = re.search(pattern, txt, flags=re.MULTILINE).group('auditor').strip() print(repr(auditor)) except AttributeError: print(txt) print('============') print(url)它產(chǎn)生以下結(jié)果'ShineWing''ShineWing''Hong Kong Standards on Auditing (HKSAs) issued by the Hong Kong Institute of''Hong Kong Financial Reporting Standards issued by the Hong Kong Institute of'期望的結(jié)果是：'ShineWing''ShineWing''Ernst & Young''Elite Partners CPA Limited'我試過：pattern = r'.*\n.*?(?P<auditor>[A-Z].+?\n?)$(?!Institute)(?:LLP\s*)?\s*((PRC.*?|Chinese.*?)?[Cc]ertified [Pp]ublic|[Cc]hartered) [Aa]ccountants' 此模式捕獲后兩種情況，但不捕獲前 2 種情況。pattern = r'.*\n.*?(?P<auditor>^(?!Hong|Kong)[A-Z].+?\n?)(?:LLP\s*)?\s*((PRC.*?|Chinese.*?)?[Cc]ertified [Pp]ublic|[Cc]hartered) [Aa]ccountants' 這會產(chǎn)生所需的結(jié)果，但^(?!Hong|Kong)存在潛在風險，因為它可能會在未來忽略其他所需的結(jié)果，因此它不是一個好的候選者。相反，$(?!Institute)更通用和合適，但我不知道為什么它在前兩種情況下無法匹配。如果有一種方法可以忽略包含的匹配項，那就太好了issued by the Hong Kong Institute of任何建議將不勝感激。謝謝。

查看完整描述

1 回答

德瑪西亞99

TA貢獻1770條經(jīng)驗獲得超3個贊

pattern = r'\n.*?(?P<auditor>(?!.*Institute)[A-Z].+?)(?:LLP\s*)?\s*((PRC.*?|Chinese.*?)?[Cc]ertified [Pp]ublic|[Cc]hartered) [Aa]ccountants'

這行得通。

反對回復(fù) 2023-05-09

1 回答
0 關(guān)注
188 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何忽略正則表達式中不需要的模式

如何忽略正則表達式中不需要的模式

1 回答

添加回答