我正在使用jsoup解析html并想要在body標(biāo)簽內(nèi)提取innerHtml到目前為止,我嘗試并使用document.body.childern()。outerHtml; 但它只提供html元素并在正文內(nèi)部跳過浮動文本(不包含在任何html標(biāo)記內(nèi))private String getBodyTag(final Document document) {
return document.body().children().outerHtml();}輸入:<!DOCTYPE html><html lang="de">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="stylesheet" type="text/css" href="assets/style.css">
</head>
<body>
<div>questions to improve formatting and clarity.</div>
<h3>Guided Mode</h3>
some sample raw/floating text </body></html>預(yù)期:<div>questions to improve formatting and clarity.</div><h3>Guided Mode</h3> some sample raw/floating text實際:<div>questions to improve formatting and clarity.</div><h3>Guided Mode</h3>
使用jsoup從body標(biāo)簽中提取innerHtml
拉風(fēng)的咖菲貓
2019-04-26 17:15:38