為什么最后用urlopen讀取線上pdf地址時,讀取信息顯示異常
顯示如下:
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2096
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3237
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 884
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1528
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 703
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3344
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 4177
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1492
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 990
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2082
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 686
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 801
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 703
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2096
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3237
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 5196
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 933
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 884
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1528
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1492
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 990
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2082
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 686
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 801
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 4033
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 841
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 686
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1107
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1625
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 683
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2201
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3647
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 660
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2059
WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2986
...
...
2016-11-16
WARNING:pdfminer.converter:undefined:
i try this, and it works.
however , i don't know why !
-------------------------------------------------------------------------------------------------------------------------------------------
it sets the root logger to level Error. This will stop PDFMiner warn logging, since it logs to the root logger, but not your own logging.
I needed to set propagation to False, because after PDFMiner usage, I had duplicate logging entries. This was caused by the root logger.
from:?http://stackoverflow.com/questions/29762706/warnings-on-pdfminer
2018-12-03
emmmmmm 對啊,去除警告不是目的,目的是為了顯示中文啊。。。。警告去了,中文還是沒顯示出來。。有啥意義呢
2016-11-17
回復(fù) 原來我叫小土慕課網(wǎng)給我改了名字:
我後來繼續(xù)做 發(fā)現(xiàn) pdf 分兩種?
1.文字轉(zhuǎn)pdf => 用pdfminerk3 處理 轉(zhuǎn)回txt
2.圖片轉(zhuǎn)pdf=> 用Tesseract (OCR庫)處理 轉(zhuǎn)回txt
所以上面那篇如果轉(zhuǎn)出來 還是沒東西的話?
可以用Tesseract (OCR庫)試試看?
我最後用下面幾個庫 解決pdf是圖檔狀態(tài)下的問題
tesseract ( OCR庫 命令在python外執(zhí)行 )
pyocr ? ? (tesseract ?python 庫的接口 )?
pillow ? (p3從python圖像庫PIL分出來的 )
imagemagick
wand ? ? ?(imagemagick python 庫的接口 )?