2 回答

TA貢獻1815條經(jīng)驗 獲得超10個贊
作業(yè)的完整描述以 JavaScript 變量的形式存儲在頁面內(nèi)。您可以使用selenium提取它或re模塊:
import re
import requests
from bs4 import BeautifulSoup
url = 'https://djinni.co/jobs2/144172-data-scientist'
html_data = requests.get(url).text
full_desc = re.search(r'fullDescription:"(.*?)",', html_data).group(1).replace(r'\r\n', '\n')
short_desc = BeautifulSoup(html_data, 'html.parser').select_one('.job-teaser').get_text()
print(short_desc)
print('-' * 80)
print(full_desc)
印刷:
Together Networks is looking for an experienced Data Scientist to join our Agile team. Together Networks is a worldwide leader in the online dating niche with millions of users across more than 45 countries. Our brands are BeNaughty, CheekyLovers, Flirt, Click&Flirt, Flirt Spielchen.
--------------------------------------------------------------------------------
What you get to deal with:
- Active collaboration with stakeholders throughout the organization;
- User experience modelling;
- Advanced segmentation;
- User behavior analytics;
- Anomaly detection, fraud detection;
- Looking for bottlenecks;
- Churn prediction.
You need to have (required):
- Masteras or PHD in Statistics, Mathematics, Computer Science or another quantitative field;
- 2+ years of experience manipulating data sets and building statistical models;
- Strong knowledge in a wide range of machine learning methods and algorithms for classification, regression, clustering, and others;
- Knowledge and experience in statistical and data mining techniques;
- Experience using statistical computer languages (Python, SLQ, etc.) to manipulate data and draw insights from large data sets.
- Knowledge of a variety of machine learning techniques and their real-world advantages\u002Fdrawbacks;
- Experience visualizing\u002Fpresenting insights for stakeholders;
- Independent, creative thinking, and ability to learn fast.
Would be a great plus:
- Experience dealing with end to end machine learning projects: data exploration, feature engineering\u002Fdefinition, model building, production, maintenance;
- Experience in data visualization with Tableau;
- Experience in dating, game dev, social projects.

TA貢獻1790條經(jīng)驗 獲得超9個贊
這是網(wǎng)頁抓取時的一個典型錯誤。
您可能查看了瀏覽器中呈現(xiàn)的 HTML 的源代碼,并嘗試p
獲取job-description-wrapper
div
.
但是,如果您只是加載頁面本身(瀏覽器處理的第一個請求)并查看其內(nèi)容,您會發(fā)現(xiàn)該段落最初并未加載。有些腳本會稍后加載它的內(nèi)容 - 但這種情況發(fā)生得如此之快,您作為用戶幾乎不會注意到它。
檢查此輸出:
print(requests.get(url='https://djinni.co/jobs2/144172-data-scientist').text)
這就是造成問題的原因。如何解決又是另外一回事了。一種方法是在 Python 中使用無頭瀏覽器,該瀏覽器在加載頁面后運行腳本,并且僅當頁面完成加載所有內(nèi)容時,才能獲取您需要的內(nèi)容。您可以查看類似的工具selenium
。
添加回答
舉報