4. 用标签索引代替布尔索引
# 用布尔索引选取所有得克萨斯州的学校
>>> college = pd.read_csv('data/college.csv')
>>> college[college['STABBR'] == 'TX'].head()
# 用STABBR作为行索引,然后用loc选取
In[22]: college2 = college.set_index('STABBR')
college2.loc['TX'].head()
Out[22]:
# 比较二者的速度
In[23]: %timeit college[college['STABBR'] == 'TX']
1.51 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In[24]: %timeit college2.loc['TX']
604 µs ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 使用STABBR作为行索引所用的时间
In[25]: %timeit college2 = college.set_index('STABBR')
1.28 ms ± 47.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
更多
# 使用布尔索引和标签选取多列
In[26]: states =['TX', 'CA', 'NY']
college[college['STABBR'].isin(states)]
college2.loc[states].head()
Out[26]: