1 回答

TA貢獻(xiàn)2011條經(jīng)驗(yàn) 獲得超2個(gè)贊
我對(duì) simplehtmldom 不熟悉,除了知道避免它之外。因此,我將提出一個(gè)使用 PHP 內(nèi)置 DOM 類的解決方案:
<?php
libxml_use_internal_errors(true);
// get the HTML
$html = file_get_contents("https://benthamopen.com/browse-by-title/B/1/");
// create a DOM object and load it up
$dom = new DomDocument();
$dom->loadHtml($html);
// create an XPath object and query it
$xpath = new DomXPath($dom);
$elements = $xpath->query("//div[@style='padding:10px;']");
// loop through the matches
foreach ($elements as $el) {
// skip elements without ISSN
$text = trim($el->textContent);
if (strpos($text, "ISSN") !== 0) {
continue;
}
// get the first div inside this thing
$div = $el->getElementsByTagName("div")[0];
// dump it out
printf("%s %s %s<br/>\n", str_replace("ISSN: ", "", $text), $div->getAttribute("data-title"), $div->getAttribute("data-url"));
}
XPath 的內(nèi)容可能有點(diǎn)讓人不知所措,但對(duì)于像這樣的簡(jiǎn)單搜索,它與 CSS 選擇器沒有太大區(qū)別。希望評(píng)論能解釋一切,如果沒有,請(qǐng)告訴我!
輸出:
1874-1207 The Open Biomedical Engineering Journal https://benthamopen.com/TOBEJ/home/<br/>
1874-1967 The Open Biology Journal https://benthamopen.com/TOBIOJ/home/<br/>
1874-091X The Open Biochemistry Journal https://benthamopen.com/TOBIOCJ/home/<br/>
1875-0362 The Open Bioinformatics Journal https://benthamopen.com/TOBIOIJ/home/<br/>
1875-3183 The Open Biomarkers Journal https://benthamopen.com/TOBIOMJ/home/<br/>
2665-9956 The Open Biomaterials Science Journal https://benthamopen.com/TOBMSJ/home/<br/>
1874-0707 The Open Biotechnology Journal https://benthamopen.com/TOBIOTJ/home/<br/>
- 1 回答
- 0 關(guān)注
- 109 瀏覽
添加回答
舉報(bào)