5. 分组聚合后unstacking
# 读取employee数据集,求出每个种族的平均工资
In[34]: employee = pd.read_csv('data/employee.csv')
In[35]: employee.groupby('RACE')['BASE_SALARY'].mean().astype(int)
out[35]: RACE
American Indian or Alaskan Native 60272
Asian/Pacific Islander 61660
Black or African American 50137
Hispanic/Latino 52345
Others 51278
White 64419
Name: BASE_SALARY, dtype: int64
# 对种族和性别分组,求平均工资
In[36]: agg = employee.groupby(['RACE', 'GENDER'])['BASE_SALARY'].mean().astype(int)
agg
out[36]: RACE GENDER
American Indian or Alaskan Native Female 60238
Male 60305
Asian/Pacific Islander Female 63226
Male 61033
Black or African American Female 48915
Male 51082
Hispanic/Latino Female 46503
Male 54782
Others Female 63785
Male 38771
White Female 66793
Male 63940
Name: BASE_SALARY, dtype: int64
# 对索引层GENDER做unstack
In[37]: agg.unstack('GENDER')
out[37]:
# 对索引层RACE做unstack
In[38]: agg.unstack('RACE')
out[38]:
更多
# 按RACE和GENDER分组,求工资的平均值、最大值和最小值
In[39]: agg2 = employee.groupby(['RACE', 'GENDER'])['BASE_SALARY'].agg(['mean', 'max', 'min']).astype(int)
agg2
out[39]:
# 此时unstack('GENDER')会生成多级列索引,可以用stack和unstack调整结构
agg2.unstack('GENDER')