首頁猿問如何使用 pandas...

如何使用 pandas 從文件中提取 html 表？

Python

守候你守候我 2023-09-19 17:33:56

我是 pandas 新手，我正在嘗試從一些 HTML 文件中提取一些數(shù)據(jù)。如何轉(zhuǎn)換多個 HTML 表，如下所示： PS4Game Name | PriceGoW | 49.99FF VII R | 59.99 XBXGame Name | PriceGears 5 | 49.99Forza 5 | 59.99<table> <tr colspan="2"> <td>PS4</td> </tr> <tr> <td>Game Name</td> <td>Price</td> </tr> <tr> <td>GoW</td> <td>49.99</td> </tr> <tr> <td>FF VII R</td> <td>59.99</td> </tr></table><table> <tr colspan="2"> <td>XBX</td> </tr> <tr> <td>Game Name</td> <td>Price</td> </tr> <tr> <td>Gears 5</td> <td>49.99</td> </tr> <tr> <td>Forza 5</td> <td>59.99</td> </tr></table>像這樣的 json 對象：[ { "Game Name": "Gow", "Price": "49.99", "platform": "PS4"}, { "Game Name": "FF VII R", "Price": "59.99", "platform": "PS4"}, { "Game Name": "Gears 5", "Price": "49.99", "platform": "XBX"}, { "Game Name": "Forza 5", "Price": "59.99", "platform": "XBX"}]我嘗試使用 pandas.read_html(path/to/file) 加載包含表的 html 文件，它確實返回了 DataFrame 列表，但我不知道之后如何提取數(shù)據(jù)，特別是平臺名稱位于標(biāo)題而不是作為單獨的列。我使用 pandas 是因為我從包含其他形式的表格和 HTML 代碼的本地 htm 文件中提取這些表格，所以我使用：tables = pandas.read_html(file_path, match="Game Name")使用基于該列名稱的匹配參數(shù)快速隔離我需要的表。

查看完整描述

1 回答

紅顏莎娜

TA貢獻(xiàn)1842條經(jīng)驗獲得超13個贊

import pandas as pd

# list to save all dataframe from all tables in all files

df_list = list()

# list of files to load

list_of_files = ['test.html']

# iterate through your files

for file in list_of_files:

# create a list of dataframes from the tables in the file

dfl = pd.read_html(file, match='Game Name')

# fix the headers and columns

for d in dfl:

# select row 1 as the headers

d.columns = d.iloc[1]

# select row 0, column 0 as the platform

d['platform'] = d.iloc[0, 0]

# selection row 2 and below as the data, row 0 and 1 were the headers

d = d.iloc[2:]

# append the cleaned dataframe to df_list

df_list.append(d.copy())

# create a single dataframe

df = pd.concat(df_list).reset_index(drop=True)

# create a list of dicts from df

records = df.to_dict('records')

print(records)

[out]:

[{'Game Name': 'GoW', 'Price': '49.99', 'platform': 'PS4'},

{'Game Name': 'FF VII R', 'Price': '59.99', 'platform': 'PS4'},

{'Game Name': 'Gears 5', 'Price': '49.99', 'platform': 'XBX'},

{'Game Name': 'Forza 5', 'Price': '59.99', 'platform': 'XBX'}]

反對回復(fù) 2023-09-19

1 回答
0 關(guān)注
143 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何使用 pandas 從文件中提取 html 表？

如何使用 pandas 從文件中提取 html 表？

1 回答

添加回答

如何使用 pandas 從文件中提取 html 表？