3 回答

TA貢獻(xiàn)1796條經(jīng)驗 獲得超7個贊
而不是.unstack(),另一種方法是.melt()。
您可以使用 轉(zhuǎn)置數(shù)據(jù)框,并.T使用 獲取第一行之后的所有內(nèi)容.iloc[1:]。然后,.rename列、.replace帶有.1一些正則表達(dá)式的.melt數(shù)據(jù)框和.sort_values.
df = pd.DataFrame({'Age':[np.nan, 1,2,3],'Male':['Big',2,3,4],'Female':['Small',3,4,5],'Male.1':['Small',2,3,4],'Female.1':['Big',3,4,5]})
df = (df.T.reset_index().iloc[1:]
.rename({'index' : 'Gender', 0 : 'Size'}, axis=1)
.replace(r'\.\d+$', '', regex=True)
.melt(id_vars=['Gender', 'Size'], value_name='[measure]', var_name='Age')
.sort_values(['Size', 'Gender', 'Age'], ascending=[True,False,True])
.reset_index(drop=True))
df = df[['Age', 'Gender', 'Size', '[measure]']]
df
Out[41]:
Age Gender Size [measure]
0 1 Male Big 2
1 2 Male Big 3
2 3 Male Big 4
3 1 Female Big 3
4 2 Female Big 4
5 3 Female Big 5
6 1 Male Small 2
7 2 Male Small 3
8 3 Male Small 4
9 1 Female Small 3
10 2 Female Small 4
11 3 Female Small 5

TA貢獻(xiàn)1831條經(jīng)驗 獲得超10個贊
如果可能,創(chuàng)建前 2 行MultiIndex
和第一列作為索引依據(jù)header
和index_col
參數(shù) in?read_excel
:
df = pd.read_excel('file.xlsx',header=[0,1], index_col=[0])
? ??
print (df)
Age Male Female? Male Female
? ? ?Big? Small Small? ? Big
1.0? ? 2? ? ? 3? ? ?2? ? ? 3
2.0? ? 3? ? ? 4? ? ?3? ? ? 4
3.0? ? 4? ? ? 5? ? ?4? ? ? 5
print (df.columns)
MultiIndex([(? 'Male',? ?'Big'),
? ? ? ? ? ? ('Female', 'Small'),
? ? ? ? ? ? (? 'Male', 'Small'),
? ? ? ? ? ? ('Female',? ?'Big')],
? ? ? ? ? ?names=['Age', None])
print (df.index)
Float64Index([1.0, 2.0, 3.0], dtype='float64')
所以可能使用DataFrame.unstack
:
df = (df.unstack()
? ? ? ? .rename_axis(['Gender', 'Size','Age'])
? ? ? ? .reset_index(name='measure'))
print (df)
? ? Gender? ?Size? Age? measure
0? ? ?Male? ? Big? 1.0? ? ? ? 2
1? ? ?Male? ? Big? 2.0? ? ? ? 3
2? ? ?Male? ? Big? 3.0? ? ? ? 4
3? ?Female? Small? 1.0? ? ? ? 3
4? ?Female? Small? 2.0? ? ? ? 4
5? ?Female? Small? 3.0? ? ? ? 5
6? ? ?Male? Small? 1.0? ? ? ? 2
7? ? ?Male? Small? 2.0? ? ? ? 3
8? ? ?Male? Small? 3.0? ? ? ? 4
9? ?Female? ? Big? 1.0? ? ? ? 3
10? Female? ? Big? 2.0? ? ? ? 4
11? Female? ? Big? 3.0? ? ? ? 5
如果不可能使用:
您可以創(chuàng)建MultiIndex
和MultiIndex.from_arrays
刪除最后一個.
數(shù)字 by?replace
,然后過濾掉第一行DataFrame.iloc
并按DataFrame.melt
第一列重塑形狀,最后設(shè)置新的列名稱:
df.columns = pd.MultiIndex.from_arrays([df.columns.str.replace(r'\.\d+$', ''),?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? df.iloc[0]])
df = df.iloc[1:].melt(df.columns[:1].tolist())
df.columns=['Age','Gender','Size','measure']
print (df)
? ? Age? Gender? ?Size measure
0? ?1.0? ? Male? ? Big? ? ? ?2
1? ?2.0? ? Male? ? Big? ? ? ?3
2? ?3.0? ? Male? ? Big? ? ? ?4
3? ?1.0? Female? Small? ? ? ?3
4? ?2.0? Female? Small? ? ? ?4
5? ?3.0? Female? Small? ? ? ?5
6? ?1.0? ? Male? Small? ? ? ?2
7? ?2.0? ? Male? Small? ? ? ?3
8? ?3.0? ? Male? Small? ? ? ?4
9? ?1.0? Female? ? Big? ? ? ?3
10? 2.0? Female? ? Big? ? ? ?4
11? 3.0? Female? ? Big? ? ? ?5
或者解決方案DataFrame.unstack
是可能的,只將第一列設(shè)置為index
by并為新列名稱設(shè)置byDataFrame.set_index
的級別:MultiIndex
Series.rename_axis
df.columns = pd.MultiIndex.from_arrays([df.columns.str.replace(r'\.\d+$', ''),?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? df.iloc[0]])
df = (df.iloc[1:].set_index(df.columns[:1].tolist())
? ? ? ? .unstack()
? ? ? ? .rename_axis(['Gender', 'Size','Age'])
? ? ? ? .reset_index(name='measure'))
print (df)
? ? Gender? ?Size? Age measure
0? ? ?Male? ? Big? 1.0? ? ? ?2
1? ? ?Male? ? Big? 2.0? ? ? ?3
2? ? ?Male? ? Big? 3.0? ? ? ?4
3? ?Female? Small? 1.0? ? ? ?3
4? ?Female? Small? 2.0? ? ? ?4
5? ?Female? Small? 3.0? ? ? ?5
6? ? ?Male? Small? 1.0? ? ? ?2
7? ? ?Male? Small? 2.0? ? ? ?3
8? ? ?Male? Small? 3.0? ? ? ?4
9? ?Female? ? Big? 1.0? ? ? ?3
10? Female? ? Big? 2.0? ? ? ?4
11? Female? ? Big? 3.0? ? ? ?5

TA貢獻(xiàn)1895條經(jīng)驗 獲得超3個贊
通過將行 0 與列組合來創(chuàng)建多索引列:
df.columns = pd.MultiIndex.from_arrays((df.columns, df.iloc[0]))
df.columns.names = ['gender', 'size']
df.columns
MultiIndex([( 'Age', nan),
( 'Male', 'Big'),
( 'Female', 'Small'),
( 'Male.1', 'Small'),
('Female.1', 'Big')],
names=['gender', 'size'])
現(xiàn)在您可以重塑和重命名:
(df
.dropna()
.melt([('Age', np.NaN)], value_name='measure')
.replace(r'\.\d+$', '', regex=True)
.rename(columns={("Age", np.NaN) : "Age"}))
Age gender size measure
0 1.0 Male Big 2
1 2.0 Male Big 3
2 3.0 Male Big 4
3 1.0 Female Small 3
4 2.0 Female Small 4
5 3.0 Female Small 5
6 1.0 Male Small 2
7 2.0 Male Small 3
8 3.0 Male Small 4
9 1.0 Female Big 3
10 2.0 Female Big 4
11 3.0 Female Big 5
添加回答
舉報