首頁猿問如何搜索目錄中所有文件類型的正則表達(dá)式

如何搜索目錄中所有文件類型的正則表達(dá)式

Python

九州編程 2021-09-02 16:35:45

所以，我想在我的整個目錄中搜索包含正則表達(dá)式列表的文件。這包括：目錄、pdf 和 csv 文件。僅搜索文本文件時，我可以成功完成此任務(wù)，但搜索所有文件類型卻很困難。以下是我迄今為止的工作：import globimport reimport PyPDF2#-------------------------------------------------Input----------------------------------------------------------------------------------------------folder_path = "/home/"file_pattern = "/*"folder_contents = glob.glob(folder_path + file_pattern)#Search for Emailsregex1= re.compile(r'\S+@\S+')#Search for Phone Numbersregex2 = re.compile(r'\d\d\d[-]\d\d\d[-]\d\d\d\d')#Search for Locationsregex3 =re.compile("([A-Z]\w+), ([A-Z]{2})")for file in folder_contents: read_file = open(file, 'rt').read()if readile_file == pdf: pdfFileObj = open('pdf.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) pageObj = pdfReader.getPage(0) content= pageObj.extractText()) if regex1.findall(read_file) or regex2.findall(read_file) or regex3.findall(read_file): print ("YES, This file containts PHI") print(file) else: print("No, This file DOES NOT contain PHI") print(file)當(dāng)我運(yùn)行它時，我收到此錯誤：YES, This file containts PHI/home/e136320/sample.txtNo, This file DOES NOT contain PHI/home/e136320/medicalSample.txt---------------------------------------------------------------------------UnicodeDecodeError Traceback (most recent call last)<ipython-input-129-be0b68229c20> in <module>() 19 20 for file in folder_contents:---> 21 read_file = open(file, 'rt').read() 22 if readile_file == pdf: 23 # creating a pdf file objectUnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 10: invalid continuation byte有什么建議？

查看完整描述

1 回答

UYOU

TA貢獻(xiàn)1878條經(jīng)驗(yàn) 獲得超4個贊

你不能打開這樣的 pdf 文件，它需要一個純文本文件。你可以使用這樣的東西：

fn, ext = os.path.splitext(file)

if ext == '.pdf':

open_function = PyPDF2.PdfFileReader

else: # plain text

open_function = open

with open_function(file, 'rt') as open_file:

# Do something with open file...

此代碼段檢查文件擴(kuò)展名，然后根據(jù)它找到的內(nèi)容分配一個打開函數(shù)，這有點(diǎn)幼稚，可以使用類似于此答案中顯示的方法來做得更好。

反對回復(fù) 2021-09-02

1 回答
0 關(guān)注
245 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何搜索目錄中所有文件類型的正則表達(dá)式

如何搜索目錄中所有文件類型的正則表達(dá)式

1 回答

添加回答