2 回答

TA貢獻(xiàn)1794條經(jīng)驗 獲得超8個贊
這是我編寫并為我完成工作的代碼:
import numpy as nd
import pandas as pd
url = 'http://fcf.cat/calendari/1920/futbol-11/infantil-primera-divisio/grup-{}'
hoja = 'grup-{}'
xlwriter = pd.ExcelWriter('Resultados.xlsx')
for i in range(1,19):
converturl = url.format(i)
# print(url)
dfResultats = pd.read_html(converturl, header=0)
numgrupo = hoja.format(i)
tablelen = len (dfResultats)
tablelen = tablelen - 1
strrow = 0
strcol = 0
for x in range (1,tablelen):
df = dfResultats[x]
df.to_excel(xlwriter,sheet_name=numgrupo, index=False, startrow=strrow , startcol=strcol)
strrow = strrow + 9
xlwriter.close()
我的想法是增強(qiáng)代碼以將 href 包含在 Excel 工作表中,因為這包括鏈接到我稍后需要提取的游戲詳細(xì)信息。

TA貢獻(xiàn)1829條經(jīng)驗 獲得超7個贊
該腳本將從表中獲取所有信息,包括鏈接:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'http://fcf.cat/calendari/1920/futbol-11/infantil-primera-divisio/grup-1'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
out = []
for tr in soup.select('.calendaritable tbody tr'):
t = tr.get_text(strip=True, separator='|').split('|')
a = tr.select_one('.calendaritablehover a')
jornada = tr.find_previous('th', {'colspan': '4'}).get_text(strip=True)
jornada_date = tr.find_previous('th', {'colspan': '3'}).get_text(strip=True)
if len(t) == 4: # there's result
out.append({'Team 1': t[0], 'Team 2': t[3], 'Goals 1': t[1], 'Goals 2': t[2], 'Jornada': jornada, 'Jornada date': jornada_date, 'Link': a['href'] if a else ''})
else:
out.append({'Team 1': t[0], 'Team 2': t[2], 'Goals 1': '', 'Goals 2': '', 'Jornada': jornada, 'Jornada date': jornada_date, 'Link': a['href'] if a else ''})
df = pd.DataFrame(out)
print(df)
df.to_csv('data.csv')
印刷:
Team 1 Team 2 Goals 1 Goals 2 Jornada Jornada date Link
0 CATALONIA, U.B.,A EUROPA, C.E.,B 0 1 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...
1 FUNDACIO P. CE. JUPITER,A ESCOLA PIA SARRIà S.E.,A 1 0 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...
2 Pa BARC. CINC COPES,A MONTA?ESA, C.F.,A 1 0 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...
3 Pa BARC. BARCINO, CE,A Pa BARC. ANGUERA,B 0 3 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...
4 DIAGONAL CLUB ESP.,A SISTRELLS C.F.,A 8 2 Jornada 1 06-10-2019 http://fcf.cat/acta/1920/futbol-11/infantil-pr...
.. ... ... ... ... ... ... ...
235 DIAGONAL CLUB ESP.,A Pa BARC. ANGUERA,B Jornada 30 31-05-2020
236 MONTA?ESA, C.F.,A Pa BARC. BARCINO, CE,A Jornada 30 31-05-2020
237 Pa BARC. CINC COPES,A ESCOLA PIA SARRIà S.E.,A Jornada 30 31-05-2020
238 FUNDACIO P. CE. JUPITER,A EUROPA, C.E.,B Jornada 30 31-05-2020
239 L'HOSPITALET, CENTRE ESPORTS,B CATALONIA, U.B.,A Jornada 30 31-05-2020
[240 rows x 7 columns]
添加回答
舉報