1 回答

TA貢獻(xiàn)1817條經(jīng)驗(yàn) 獲得超6個(gè)贊
您已經(jīng)選擇了所有表,但您明確地只獲得了第一個(gè):
Element tableElement = doc.select("table").first();
相反,您可以輕松地遍歷所有這些:
Elements tableElements = doc.select("table");
for (Element tableElement : tableElements) {
// for each of selected tables
}
因此,經(jīng)過一些修改以獲得唯一的文件名后,代碼將如下所示:
public static void getData(String url) throws IOException {
String html = Jsoup.connect(url).execute().body();
// this one is tricky as it contains tables as commented out HTML, and shows them using javascript code
// so I'm using dirty replace to remove comment tags before parsing to make tables visible to Jsoup
html = html.replaceAll("<!--", "");
html = html.replaceAll("-->", "");
Document doc = Jsoup.parse(html);
Elements tableElements = doc.select("table");
int number = 1;
for (Element tableElement : tableElements) {
String tableId = tableElement.id();
if (tableId.isEmpty()) {
// skip table without id
continue;
}
tableId = " with id " + tableId;
String fileName = "table" + number++ + tableId + ".csv";
FileWriter writer = new FileWriter(fileName);
System.out.println(doc);
Elements tableHeaderEles = tableElement.select("thead tr th");
for (int i = 0; i < tableHeaderEles.size(); i++) {
writer.append(tableHeaderEles.get(i).text());
if (i != tableHeaderEles.size() - 1) {
writer.append(',');
}
}
writer.append('\n');
System.out.println();
Elements tableRowElements = tableElement.select(":not(thead) tr");
for (int i = 0; i < tableRowElements.size(); i++) {
Element row = tableRowElements.get(i);
Elements rowItems = row.select("td");
for (int j = 0; j < rowItems.size(); j++) {
writer.append(rowItems.get(j).text());
if (j != rowItems.size() - 1) {
writer.append(',');
}
}
writer.append('\n');
}
writer.close();
}
}
回答你的第二個(gè)問題:
根據(jù) ID 抓取表
而不是選擇所有表的第一個(gè)表:
Element tableElement = doc.select("table").first();
選擇具有 id 的表的第一個(gè)表advanced:
Element tableElement = doc.select("table#advanced").first();
附加建議:您作為參數(shù)提供的東西select(...)是CSS 選擇器。
添加回答
舉報(bào)