首頁猿問計算pdf文件中每個單詞的出現(xiàn)次數(shù)...

計算pdf文件中每個單詞的出現(xiàn)次數(shù)java

Java

阿晨1998 2021-12-22 17:56:39

我正在使用 PDFbox 制作一個 java 程序，它讀取任何 pdf 文件并計算每個單詞在文件中出現(xiàn)的次數(shù)，但由于某種原因，當(dāng)我運行程序時沒有出現(xiàn)任何內(nèi)容，我希望它打印每個單詞以及出現(xiàn)的次數(shù)旁邊的字。提前致謝。這是我的代碼：package lab8;import java.io.File;import java.io.FileNotFoundException;import java.io.IOException;import java.util.Map;import java.util.TreeMap;import java.util.Scanner;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.text.PDFTextStripper;public class Extractor {public static void main(String[] args) throws FileNotFoundException { Map<String, Integer> frequencies = new TreeMap<String, Integer>(); PDDocument pd; File input = new File("C:\\Users\\Ammar\\Desktop\\Application.pdf"); Scanner in = new Scanner(input); try { pd = PDDocument.load(input); PDFTextStripper stripper = new PDFTextStripper(); stripper.setEndPage(20); String text = stripper.getText(pd); while (in.hasNext()) { String word = clean(in.next()); if (word != "") { Integer count = frequencies.get(word); if (count == null) { count = 1; } else { count = count + 1; } frequencies.put(word, count); } } for (String key : frequencies.keySet()) { System.out.println(key + ": " + frequencies.get(key)); } if (pd != null) { pd.close(); } } catch (IOException e) { e.printStackTrace(); } } private static String clean(String s) { String r = ""; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (Character.isLetter(c)) { r = r + c; } } return r.toLowerCase(); } }

查看完整描述

2 回答

子衿沉夜

TA貢獻(xiàn)1828條經(jīng)驗獲得超3個贊

我試圖解決這個邏輯。

import java.io.File;

import java.io.FileNotFoundException;

import java.io.IOException;

import java.util.Map;

import java.util.TreeMap;

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.text.PDFTextStripper;

public class Extractor {

public static void main(String[] args) throws FileNotFoundException {

Map<String, Integer> wordFrequencies = new TreeMap<String, Integer>();

Map<Character, Integer> charFrequencies = new TreeMap<Character, Integer>();

PDDocument pd;

File input = new File("C:\\Users\\Ammar\\Desktop\\Application.pdf");

try {

pd = PDDocument.load(input);

PDFTextStripper stripper = new PDFTextStripper();

stripper.setEndPage(20);

String text = stripper.getText(pd);

for(int i=0; i<text.length(); i++)

{

char c = text.charAt(i);

int count = charFrequencies.get(c) != null ? (charFrequencies.get(c)) + 1 : 1;

charFrequencies.put(c, count);

}

String[] texts = text.split(" ");

for (String txt : texts) {

int count = wordFrequencies.get(txt) != null ? (wordFrequencies.get(txt)) + 1 : 1;

wordFrequencies.put(txt, count);

}

System.out.println("Printing the number of words");

for (String key : wordFrequencies.keySet()) {

System.out.println(key + ": " + wordFrequencies.get(key));

}

System.out.println("Printing the number of characters");

for (char charKey : charFrequencies.keySet()) {

System.out.println(charKey + ": " + charFrequencies.get(charKey));

}

if (pd != null) {

pd.close();

}

} catch (IOException e) {

e.printStackTrace();

}

試試這個代碼。如果仍然存在一些問題并且您無法解決。我可以嘗試解決。

反對回復(fù) 2021-12-22

慕標(biāo)5832272

TA貢獻(xiàn)1966條經(jīng)驗獲得超4個贊

在您的代碼中，您還可以通過傳遞您的字符串來使用 StringTokenizer 的對象，即

StringTokenizer st = new StringTokenizer(stripper.getText(pd));

并在 while 循環(huán)st.hasMoreTokens()中渲染每個單詞String word = clean(st.nextToken());這也很好用。

反對回復(fù) 2021-12-22

2 回答
0 關(guān)注
340 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

計算pdf文件中每個單詞的出現(xiàn)次數(shù)java

計算pdf文件中每個單詞的出現(xiàn)次數(shù)java

2 回答

添加回答