2 回答

TA貢獻(xiàn)1783條經(jīng)驗(yàn) 獲得超4個(gè)贊
這個(gè)是相當(dāng)棘手的。請(qǐng)嘗試下面的代碼片段:
import pandas as pd
url = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
df = pd.read_csv(url,
sep='\s+',
comment='%',
usecols=(0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11),
names=('Year', 'Month', 'M.Anomaly', 'M.Unc.', 'A.Anomaly',
'A.Unc.','5y.Anomaly', '5y.Unc.' ,'10y.Anomaly', '10y.Unc.',
'20y.Anomaly', '20y.Unc.'))

TA貢獻(xiàn)1772條經(jīng)驗(yàn) 獲得超8個(gè)贊
問題是該文件有 77 行注釋文本,例如
'Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Air Temperatures'
其中兩行是標(biāo)題
有一堆數(shù)據(jù),然后還有兩個(gè)標(biāo)頭,以及一組新數(shù)據(jù)
'Global Average Temperature Anomaly with Sea Ice Temperature Inferred from Water Temperatures'
該解決方案將文件中的兩個(gè)表分成單獨(dú)的數(shù)據(jù)幀。
這不像其他答案那么好,但數(shù)據(jù)被正確地分成不同的數(shù)據(jù)幀。
標(biāo)題很痛苦,手動(dòng)創(chuàng)建自定義標(biāo)題并跳過將標(biāo)題與文本分開的代碼行可能會(huì)更容易。
重要的一點(diǎn)是
air
與ice
數(shù)據(jù)分離。
import requests
import pandas as pd
import math
# read the file with requests
url = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
response = requests.get(url)
data = response.text
# convert data into a list
data = [d.strip().replace('% ', '') for d in data.split('\n')]
# specify the data from the ranges in the file
air_header1 = data[74].split() # not used
air_header2 = [v.strip() for v in data[75].split(',')]
# combine the 2 parts of the header into a single header
air_header = air_header2[:2] + [f'{air_header1[math.floor(i/2)]}_{v}' for i, v in enumerate(air_header2[2:])]
air_data = [v.split() for v in data[77:2125]]
h2o_header1 = data[2129].split() # not used
h2o_header2 = [v.strip() for v in data[2130].split(',')]
# combine the 2 parts of the header into a single header
h2o_header = h2o_header2[:2] + [f'{h2o_header1[math.floor(i/2)]}_{v}' for i, v in enumerate(h2o_header2[2:])]
h2o_data = [v.split() for v in data[2132:4180]]
# create the dataframes
air = pd.DataFrame(air_data, columns=air_header)
h2o = pd.DataFrame(h2o_data, columns=h2o_header)
沒有標(biāo)題代碼
通過使用手動(dòng)標(biāo)頭列表來簡(jiǎn)化代碼。
import pandas as pd
import requests
# read the file with requests
url = 'http://berkeleyearth.lbl.gov/auto/Global/Land_and_Ocean_complete.txt'
response = requests.get(url)
data = response.text
# convert data into a list
data = [d.strip().replace('% ', '') for d in data.split('\n')]
# manually created header
headers = ['Year', 'Month', 'Monthly_Anomaly', 'Monthly_Unc.',
'Annual_Anomaly', 'Annual_Unc.',
'Five-year_Anomaly', 'Five-year_Unc.',
'Ten-year_Anomaly', 'Ten-year_Unc.',
'Twenty-year_Anomaly', 'Twenty-year_Unc.']
# separate the air and h2o data
air_data = [v.split() for v in data[77:2125]]
h2o_data = [v.split() for v in data[2132:4180]]
# create the dataframes
air = pd.DataFrame(air_data, columns=headers)
h2o = pd.DataFrame(h2o_data, columns=headers)
air
Year Month Monthly_Anomaly Monthly_Unc. Annual_Anomaly Annual_Unc. Five-year_Anomaly Five-year_Unc. Ten-year_Anomaly Ten-year_Unc. Twenty-year_Anomaly Twenty-year_Unc.
0 1850 1 -0.777 0.412 NaN NaN NaN NaN NaN NaN NaN NaN
1 1850 2 -0.239 0.458 NaN NaN NaN NaN NaN NaN NaN NaN
2 1850 3 -0.426 0.447 NaN NaN NaN NaN NaN NaN NaN NaN
h2o
Year Month Monthly_Anomaly Monthly_Unc. Annual_Anomaly Annual_Unc. Five-year_Anomaly Five-year_Unc. Ten-year_Anomaly Ten-year_Unc. Twenty-year_Anomaly Twenty-year_Unc.
0 1850 1 -0.724 0.370 NaN NaN NaN NaN NaN NaN NaN NaN
1 1850 2 -0.221 0.430 NaN NaN NaN NaN NaN NaN NaN NaN
2 1850 3 -0.443 0.419 NaN NaN NaN NaN NaN NaN NaN NaN
添加回答
舉報(bào)