首頁(yè) 猿問(wèn) 如何禁止搜索引擎爬蟲(chóng)抓取網(wǎng)站頁(yè)面？

如何禁止搜索引擎爬蟲(chóng)抓取網(wǎng)站頁(yè)面？

爬蟲(chóng)

萬(wàn)千封印 2018-10-17 19:15:22

如何禁止搜索引擎爬蟲(chóng)抓取網(wǎng)站頁(yè)面

查看完整描述

1 回答

幕布斯6054654

TA貢獻(xiàn)1876條經(jīng)驗(yàn) 獲得超7個(gè)贊

以下列舉了屏蔽主流搜索引擎爬蟲(chóng)（蜘蛛）抓取/索引/收錄網(wǎng)頁(yè)的幾種思路。注意：是整站屏蔽，而且是盡可能的屏蔽掉所有主流搜索引擎的爬蟲(chóng)（蜘蛛）。

1、通過(guò) robots.txt 文件屏蔽
可以說(shuō) robots.txt 文件是最重要的一種渠道（能和搜索引擎建立直接對(duì)話），給出以下建議：
User-agent: Baiduspider
Disallow: /
User-agent: Googlebot
Disallow: /
User-agent: Googlebot-Mobile
Disallow: /
User-agent: Googlebot-Image
Disallow:/
User-agent: Mediapartners-Google
Disallow: /
User-agent: Adsbot-Google
Disallow: /
User-agent:Feedfetcher-Google
Disallow: /
User-agent: Yahoo! Slurp
Disallow: /
User-agent: Yahoo! Slurp China
Disallow: /
User-agent: Yahoo!-AdCrawler
Disallow: /
User-agent: YoudaoBot
Disallow: /
User-agent: Sosospider
Disallow: /
User-agent: Sogou spider
Disallow: /
User-agent: Sogou web spider
Disallow: /
User-agent: MSNBot
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: Tomato Bot
Disallow: /
User-agent: *
Disallow: /
2、通過(guò) meta tag 屏蔽
在所有的網(wǎng)頁(yè)頭部文件添加，添加如下語(yǔ)句：
<meta name="robots" content="noindex, nofollow">
3、通過(guò)服務(wù)器（如：Linux/nginx ）配置文件設(shè)置
直接過(guò)濾 spider/robots 的IP 段。