2 回答

TA貢獻(xiàn)1828條經(jīng)驗 獲得超3個贊
根據(jù)對您想做什么的稀疏描述,我建議:
從圖像中讀取文本
用冒號替換所有空格
String csvContent = imgData.replaceAll(" ",";");
將文本保存到 csv 文件
用excel打開csv文件
以下示例假定您已設(shè)法檢索數(shù)據(jù),然后對這些數(shù)據(jù)進(jìn)行后處理以提供 csv 格式。內(nèi)容被寫入一個文件,您只需雙擊該文件即可看到數(shù)據(jù)是否按照您的要求分成了列。
String[] data = new String[] {
"BOWLING O M R W ECON 0s 45 6", //notice that your OCR software does not properly recognise the string here
"TABoult 4 0 3 0 925 M 2 3",
"JETED 6 0 = 4 O 0 0"
};
BufferedWriter writer = new BufferedWriter( new FileWriter( System.getProperty( "user.home" ) + System.getProperty( "file.separator" ) + "data.csv" ) );
for( String record : data ) {
writer.write( record.replaceAll( " ", ";" ) );
writer.write( "\n" );
}
writer.close();
就像我在上面的評論中所說的那樣,您的 OCR 無法正常工作。我建議您查看 JSOUP html 解析器以獲取信息并從那里繼續(xù)。否則你不會對結(jié)果滿意。

TA貢獻(xiàn)1725條經(jīng)驗 獲得超8個贊
driver.get("https://www.espncricinfo.com/series/8048/scorecard/1178425/chennai-super-kings-vs-delhi-capitals-50th-match-indian-premier-league-2019"); WebElement element = driver.findElement(By.xpath("//article[@class='sub-module scorecard'][1]")); JavascriptExecutor js = (JavascriptExecutor) 驅(qū)動程序;js.executeScript("arguments[0].scrollIntoView(true);", element);
File screen = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
File file = new File("C:\\Users\\user\\Desktop\\screenshot1\\screenshotOfElement2.png");
FileHandler.copy(screen, file);
ITesseract instance = new Tesseract();
instance.setDatapath("C:\\selenium_work\\ScrapingText.PDF\\tessdata");
String result = instance.doOCR(file);
//System.out.println(result);
String[] lines = result.split("\\n");
this one what am trying
添加回答
舉報