1 回答

TA貢獻(xiàn)1779條經(jīng)驗 獲得超6個贊
好消息是您可以使用 PDFMiner 庫重新創(chuàng)建您可能在命令行上使用 pdf2text 運行的任何屬性/命令。請參閱下面的我使用的基本示例:
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import BytesIO
def pdf_to_text(path):
manager = PDFResourceManager()
retstr = BytesIO()
layout = LAParams(all_texts=True)
device = TextConverter(manager, retstr, laparams=layout)
filepath = open(path, 'rb')
interpreter = PDFPageInterpreter(manager, device)
for page in PDFPage.get_pages(filepath, check_extractable=True):
interpreter.process_page(page)
text = retstr.getvalue()
filepath.close()
device.close()
retstr.close()
return text
if __name__ == "__main__":
text = pdf_to_text("yourfile.pdf")
print(text)
如果您需要應(yīng)用頁碼或密碼,這些是 PDFPage.get_pages 中的可選參數(shù)。同樣,如果您需要更改布局,例如 all-texts 或 margin-size,LAParams 初始值設(shè)定項有可選屬性
添加回答
舉報