首頁猿問用node做爬蟲，爬到一半的時候報錯

用node做爬蟲，爬到一半的時候報錯

JavaScript Node.js

root丶Dima 2016-04-29 13:09:22

如題，在爬到一部分到html內(nèi)容后，node會在req中報錯，提示為：HPE_INVALID_CONSTANT網(wǎng)上找了很多東西，都說是res的header中Content-Type設(shè)置為application/json , 但是我也打印了res.header, 它的Content-Type并不是這個，而是正確的text/html我用的是http模塊中的http.get(url, callback()) 方法來實現(xiàn)的爬蟲功能.問題代碼如下：var?http?=?require('http'); var?req?=?https.get(url,?function(res)?{ ????var?html?=?'';???? ????res.on('data',?function(data){???????? ????????//?console.log('response.headers:?'+res.headers);???????? ????????html?+=?data;???? ????});????? ????res.on('end',?function(){ ????????//此處html代碼應(yīng)該顯示完整的html內(nèi)容，但只打印出前一部分的內(nèi)容。就連body和head的閉合標簽都沒有顯示 ????????console.log(html);???? ????}); }).on('error',?function(err){ ????//?console.log('request.headers:?'+res.headers); ????console.log('獲取文件出錯。'+JSON.stringify(err)); });

查看完整描述

3 回答

_瀟瀟暮雨

TA貢獻646條經(jīng)驗獲得超225個贊

能上下完整的代碼么？

1 反對回復 2016-04-29

root丶Dima

TA貢獻1條經(jīng)驗獲得超0個贊

出問題的代碼如下：

var?http?=?require('http');
var?req?=?https.get(url,?function(res)?{
????var?html?=?'';
????res.on('data',?function(data){
????????//?console.log('response.headers:?'+res.headers);
????????html?+=?data;
????});

????res.on('end',?function(){
????????//此處html代碼應(yīng)該顯示完整的html內(nèi)容，但只打印出前一部分的內(nèi)容。就連body和head的閉合標簽都沒有顯示
????????console.log(html);
????});
}).on('error',?function(err){
????//?console.log('request.headers:?'+res.headers);
????console.log('獲取文件出錯。'+JSON.stringify(err));
});

反對回復 2016-04-29