6. 惰性行切片
# 读取college数据集;从行索引10到20,每隔一个取一行
In[50]: college = pd.read_csv('data/college.csv', index_col='INSTNM')
college[10:20:2]
Out[50]:
# Series也可以进行同样的切片
In[51]: city = college['CITY']
city[10:20:2]
Out[51]: INSTNM
Birmingham Southern College Birmingham
Concordia College Alabama Selma
Enterprise State Community College Enterprise
Faulkner University Montgomery
New Beginning College of Cosmetology Albertville
Name: CITY, dtype: object
# 查看第4002个行索引标签
In[52]: college.index[4001]
Out[52]: 'Spokane Community College'
# Series和DataFrame都可以用标签进行切片。下面是对DataFrame用标签切片
In[53]: start = 'Mesa Community College'
stop = 'Spokane Community College'
college[start:stop:1500]
Out[53]:
# 下面是对Series用标签切片
In[54]: city[start:stop:1500]
Out[54]: INSTNM
Mesa Community College Mesa
Hair Academy Inc-New Carrollton New Carrollton
National College of Natural Medicine Portland
Name: CITY, dtype: object
更多
惰性切片不能用于列,只能用于DataFrame的行和Series,也不能同时选取行和列。
# 下面尝试选取两列,导致错误
In[55]: college[:10, ['CITY', 'STABBR']]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-55-92538c61bdfa> in <module>()
----> 1 college[:10, ['CITY', 'STABBR']]
/Users/Ted/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
1962 return self._getitem_multilevel(key)
1963 else:
-> 1964 return self._getitem_column(key)
1965
1966 def _getitem_column(self, key):
/Users/Ted/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
1969 # get column
1970 if self.columns.is_unique:
-> 1971 return self._get_item_cache(key)
1972
1973 # duplicate columns & possible reduce dimensionality
/Users/Ted/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1641 """Return the cached item, item represents a label indexer."""
1642 cache = self._item_cache
-> 1643 res = cache.get(item)
1644 if res is None:
1645 values = self._data.get(item)
TypeError: unhashable type: 'slice'
# 只能用.loc和.iloc选取
In[56]: first_ten_instnm = college.index[:10]
college.loc[first_ten_instnm, ['CITY', 'STABBR']]
Out[56]: