3 回答

TA貢獻(xiàn)1872條經(jīng)驗(yàn) 獲得超4個(gè)贊
使用set_index與cumcount對(duì)MultiIndex,然后通過重塑unstack:
df1 = (df.set_index(['ID',df.groupby('ID').cumcount()])['Value']
.unstack()
.rename(columns=lambda x: 'Value{}'.format(x + 1))
.reset_index())
對(duì)于 python3.6+可以使用f-strings 來重命名列名稱:
df1 = (df.set_index(['ID',df.groupby('ID').cumcount()])['Value']
.unstack()
.rename(columns=lambda x: f'Value{x+1}')
.reset_index())
另一個(gè)想法是由構(gòu)造函數(shù) create lists 和 new DataFrame:
s = df.groupby('ID')['Value'].apply(list)
df1 = (pd.DataFrame(s.values.tolist(), index=s.index)
.rename(columns=lambda x: 'Value{}'.format(x + 1))
.reset_index())
print (df1)
ID Value1 Value2 Value3
0 1 ABC BCD AKB
1 2 CAB AIK NaN
2 3 KIB NaN NaN
性能:取決于行數(shù)和列的唯一值數(shù)ID:
np.random.seed(45)
a = np.sort(np.random.randint(1000, size=10000))
b = np.random.choice(list('abcde'), size=10000)
df = pd.DataFrame({'ID':a, 'Value':b})
#print (df)
In [26]: %%timeit
...: (df.set_index(['ID',df.groupby('ID').cumcount()])['Value']
...: .unstack()
...: .rename(columns=lambda x: f'Value{x+1}')
...: .reset_index())
...:
8.96 ms ± 628 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [27]: %%timeit
...: s = df.groupby('ID')['Value'].apply(list)
...: (pd.DataFrame(s.values.tolist(), index=s.index)
...: .rename(columns=lambda x: 'Value{}'.format(x + 1))
...: .reset_index())
...:
...:
105 ms ± 7.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#jpp solution
In [28]: %%timeit
...: def group_gen(df):
...: for key, x in df.groupby('ID'):
...: x = x.set_index('ID').T
...: x.index = pd.Index([key], name='ID')
...: x.columns = [f'Value{i}' for i in range(1, x.shape[1]+1)]
...: yield x
...:
...: pd.concat(group_gen(df)).reset_index()
...:
3.23 s ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

TA貢獻(xiàn)1871條經(jīng)驗(yàn) 獲得超13個(gè)贊
groupby + concat
一種方法是迭代一個(gè)groupby對(duì)象并連接結(jié)果數(shù)據(jù)幀:
def group_gen(df):
for key, x in df.groupby('ID'):
x = x.set_index('ID').T
x.index = pd.Index([key], name='ID')
x.columns = [f'Value{i}' for i in range(1, x.shape[1]+1)]
yield x
res = pd.concat(group_gen(df)).reset_index()
print(res)
ID Value1 Value2 Value3
0 1 ABC BCD AKB
1 2 CAB AIK NaN
2 3 KIB NaN NaN
添加回答
舉報(bào)