4. 用标签索引代替布尔索引

  1. # 用布尔索引选取所有得克萨斯州的学校
  2. >>> college = pd.read_csv('data/college.csv')
  3. >>> college[college['STABBR'] == 'TX'].head()
  1. # 用STABBR作为行索引,然后用loc选取
  2. In[22]: college2 = college.set_index('STABBR')
  3. college2.loc['TX'].head()
  4. Out[22]:

4. 用标签索引代替布尔索引 - 图1

  1. # 比较二者的速度
  2. In[23]: %timeit college[college['STABBR'] == 'TX']
  3. 1.51 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  4. In[24]: %timeit college2.loc['TX']
  5. 604 µs ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  1. # 使用STABBR作为行索引所用的时间
  2. In[25]: %timeit college2 = college.set_index('STABBR')
  3. 1.28 ms ± 47.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

更多

  1. # 使用布尔索引和标签选取多列
  2. In[26]: states =['TX', 'CA', 'NY']
  3. college[college['STABBR'].isin(states)]
  4. college2.loc[states].head()
  5. Out[26]:

4. 用标签索引代替布尔索引 - 图2