在新標(biāo)簽Selenium + Python中打開(kāi)網(wǎng)頁(yè)所以我試圖在我的WebDriver中的新標(biāo)簽上打開(kāi)網(wǎng)站。我想這樣做,因?yàn)槭褂肞hantomJS為每個(gè)網(wǎng)站打開(kāi)一個(gè)新的WebDriver需要大約3.5秒,我想要更快的速度......我正在使用多進(jìn)程python腳本,我想從每個(gè)頁(yè)面獲取一些元素,因此工作流程如下所示:Open BrowserLoop throught my arrayFor element in array -> Open website in new tab -> do my business -> close it但我找不到任何方法來(lái)實(shí)現(xiàn)這一目標(biāo)。這是我正在使用的代碼。網(wǎng)站之間需要永遠(yuǎn),我需要快速...其他工具是允許的,但我不知道有太多工具可以刪除使用JavaScript加載的網(wǎng)站內(nèi)容(在加載時(shí)觸發(fā)某些事件時(shí)創(chuàng)建的div)這是為什么我需要Selenium ... BeautifulSoup不能用于我的一些頁(yè)面。#!/usr/bin/env pythonimport multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, refrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysfrom PIL import Imagefrom os import listdirfrom os.path import isfile, joinfrom bs4 import BeautifulSoupfrom pprint import pprintdef getPhantomData(parameters):
try:
# We create WebDriver
browser = webdriver.Firefox()
# Navigate to URL
browser.get(parameters['target_url'])
# Find all links by Selector
links = browser.find_elements_by_css_selector(parameters['selector'])
result = []
for link in links:
# Extract link attribute and append to our list
result.append(link.get_attribute(parameters['attribute']))
browser.close()
browser.quit()
return json.dumps({'data': result})
except Exception, err:
browser.close()
browser.quit()
print errdef callback(ch, method, properties, body):
parameters = json.loads(body)
message = getPhantomData(parameters)
if message['data']:
ch.basic_ack(delivery_tag=method.delivery_tag)
else:
ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True)def consume():
credentials = pika.PlainCredentials('invitado', 'invitado')
rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials)
connection = pika.BlockingConnection(rabbit)
channel = connection.channel()
添加回答
舉報(bào)
0/150
提交
取消