第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

Python requests.get 不返回 html 文檔中標簽之一中的文本

Python requests.get 不返回 html 文檔中標簽之一中的文本

眼眸繁星 2023-07-27 16:15:55
我正在嘗試解析Djinni的個人項目工作描述。我正在使用 Python 3.6、BeautifulSoup4 和 requests 庫。當我使用 requests.get 獲取職位空缺頁面的 html 時,它返回的 html 沒有最關鍵的部分 - 描述文本。例如,采用此頁面的 url -示例和我編寫的以下代碼:def scrape_job_desc(self, url):? ? job_desc_html = self._get_search_page_html(url)? ? soup = BeautifulSoup(job_desc_html, features='html.parser')? ? try:? ? ? ? short_desc = str(soup.find('p', {'class': 'job-teaser svelte-a3rpl2'}).getText())? ? ? ? full_desc = soup.find('div', {'class': 'job-description-wrapper svelte-a3rpl2'}).find('p').getText()? ? except AttributeError:? ? ? ? short_desc = None? ? ? ? full_desc = None? ? return short_desc, full_descdef _get_search_page_html(self, url):? ? html = requests.get(url=url, headers={'User-Agent': 'Mozilla/5.0 CK={} (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko'})? ? return html.text它將返回short_desc,但不返回full_desc。此外,所需的 <p> 標簽的文本根本不存在于 html 中。但是當我使用瀏覽器下載頁面時,一切都在那里。是什么原因造成的?
查看完整描述

2 回答

?
動漫人物

TA貢獻1815條經(jīng)驗 獲得超10個贊

作業(yè)的完整描述以 JavaScript 變量的形式存儲在頁面內(nèi)。您可以使用selenium提取它或re模塊:


import re

import requests

from bs4 import BeautifulSoup



url = 'https://djinni.co/jobs2/144172-data-scientist'        

html_data = requests.get(url).text


full_desc = re.search(r'fullDescription:"(.*?)",', html_data).group(1).replace(r'\r\n', '\n')

short_desc =  BeautifulSoup(html_data, 'html.parser').select_one('.job-teaser').get_text()


print(short_desc)

print('-' * 80)

print(full_desc)

印刷:


Together Networks is looking for an experienced Data Scientist to join our Agile team. Together Networks is a worldwide leader in the online dating niche with millions of users across more than 45 countries. Our brands are BeNaughty, CheekyLovers, Flirt, Click&Flirt, Flirt Spielchen.

--------------------------------------------------------------------------------

What you get to deal with:


- Active collaboration with stakeholders throughout the organization;

- User experience modelling;

- Advanced segmentation;

- User behavior analytics;

- Anomaly detection, fraud detection;

- Looking for bottlenecks;

- Churn prediction.

 


You need to have (required):


- Masteras or PHD in Statistics, Mathematics, Computer Science or another quantitative field;

- 2+ years of experience manipulating data sets and building statistical models;

- Strong knowledge in a wide range of machine learning methods and algorithms for classification, regression, clustering, and others;

- Knowledge and experience in statistical and data mining techniques;

- Experience using statistical computer languages (Python, SLQ, etc.) to manipulate data and draw insights from large data sets.

- Knowledge of a variety of machine learning techniques and their real-world advantages\u002Fdrawbacks;

- Experience visualizing\u002Fpresenting insights for stakeholders;

- Independent, creative thinking, and ability to learn fast.


Would be a great plus:


- Experience dealing with end to end machine learning projects: data exploration, feature engineering\u002Fdefinition, model building, production, maintenance;

- Experience in data visualization with Tableau;

- Experience in dating, game dev, social projects.


查看完整回答
反對 回復 2023-07-27
?
富國滬深

TA貢獻1790條經(jīng)驗 獲得超9個贊

這是網(wǎng)頁抓取時的一個典型錯誤。

您可能查看了瀏覽器中呈現(xiàn)的 HTML 的源代碼,并嘗試p獲取job-description-wrapper div.

但是,如果您只是加載頁面本身(瀏覽器處理的第一個請求)并查看其內(nèi)容,您會發(fā)現(xiàn)該段落最初并未加載。有些腳本會稍后加載它的內(nèi)容 - 但這種情況發(fā)生得如此之快,您作為用戶幾乎不會注意到它。

檢查此輸出:

print(requests.get(url='https://djinni.co/jobs2/144172-data-scientist').text)

這就是造成問題的原因。如何解決又是另外一回事了。一種方法是在 Python 中使用無頭瀏覽器,該瀏覽器在加載頁面后運行腳本,并且僅當頁面完成加載所有內(nèi)容時,才能獲取您需要的內(nèi)容。您可以查看類似的工具selenium。


查看完整回答
反對 回復 2023-07-27
  • 2 回答
  • 0 關注
  • 200 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號