窗口函数扩展

A common alternative to rolling statistics is to use an expanding window, which yields the value of the statistic with all the data available up to that point in time.

These follow a similar interface to .rolling, with the .expanding method returning an Expanding object.

As these calculations are a special case of rolling statistics, they are implemented in pandas such that the following two calls are equivalent:

  1. In [96]: df.rolling(window=len(df), min_periods=1).mean()[:5]
  2. Out[96]:
  3. A B C D
  4. 2000-01-01 0.314226 -0.001675 0.071823 0.892566
  5. 2000-01-02 0.654522 -0.171495 0.179278 0.853361
  6. 2000-01-03 0.708733 -0.064489 -0.238271 1.371111
  7. 2000-01-04 0.987613 0.163472 -0.919693 1.566485
  8. 2000-01-05 1.426971 0.288267 -1.358877 1.808650
  9. In [97]: df.expanding(min_periods=1).mean()[:5]
  10. Out[97]:
  11. A B C D
  12. 2000-01-01 0.314226 -0.001675 0.071823 0.892566
  13. 2000-01-02 0.654522 -0.171495 0.179278 0.853361
  14. 2000-01-03 0.708733 -0.064489 -0.238271 1.371111
  15. 2000-01-04 0.987613 0.163472 -0.919693 1.566485
  16. 2000-01-05 1.426971 0.288267 -1.358877 1.808650

These have a similar set of methods to .rolling methods.

Method Summary

FunctionDescription
count()Number of non-null observations
sum()Sum of values
mean()Mean of values
median()Arithmetic median of values
min()Minimum
max()Maximum
std()Unbiased standard deviation
var()Unbiased variance
skew()Unbiased skewness (3rd moment)
kurt()Unbiased kurtosis (4th moment)
quantile()Sample quantile (value at %)
apply()Generic apply
cov()Unbiased covariance (binary)
corr()Correlation (binary)

Aside from not having a window parameter, these functions have the same interfaces as their .rolling counterparts. Like above, the parameters they all accept are:

  • min_periods: threshold of non-null data points to require. Defaults to minimum needed to compute statistic. No NaNs will be output once min_periods non-null data points have been seen.
  • center: boolean, whether to set the labels at the center (default is False).

Note: The output of the .rolling and .expanding methods do not return a NaN if there are at least min_periods non-null values in the current window. For example:

  1. In [98]: sn = pd.Series([1, 2, np.nan, 3, np.nan, 4])
  2. In [99]: sn
  3. Out[99]:
  4. 0 1.0
  5. 1 2.0
  6. 2 NaN
  7. 3 3.0
  8. 4 NaN
  9. 5 4.0
  10. dtype: float64
  11. In [100]: sn.rolling(2).max()
  12. Out[100]:
  13. 0 NaN
  14. 1 2.0
  15. 2 NaN
  16. 3 NaN
  17. 4 NaN
  18. 5 NaN
  19. dtype: float64
  20. In [101]: sn.rolling(2, min_periods=1).max()
  21. Out[101]:
  22. 0 1.0
  23. 1 2.0
  24. 2 2.0
  25. 3 3.0
  26. 4 3.0
  27. 5 4.0
  28. dtype: float64

In case of expanding functions, this differs from cumsum(), cumprod(), cummax(), and cummin(), which return NaN in the output wherever a NaN is encountered in the input. In order to match the output of cumsum with expanding, use fillna():

  1. In [102]: sn.expanding().sum()
  2. Out[102]:
  3. 0 1.0
  4. 1 3.0
  5. 2 3.0
  6. 3 6.0
  7. 4 6.0
  8. 5 10.0
  9. dtype: float64
  10. In [103]: sn.cumsum()
  11. Out[103]:
  12. 0 1.0
  13. 1 3.0
  14. 2 NaN
  15. 3 6.0
  16. 4 NaN
  17. 5 10.0
  18. dtype: float64
  19. In [104]: sn.cumsum().fillna(method='ffill')
  20. Out[104]:
  21. 0 1.0
  22. 1 3.0
  23. 2 3.0
  24. 3 6.0
  25. 4 6.0
  26. 5 10.0
  27. dtype: float64

An expanding window statistic will be more stable (and less responsive) than its rolling window counterpart as the increasing window size decreases the relative impact of an individual data point. As an example, here is the mean() output for the previous time series dataset:

  1. In [105]: s.plot(style='k--')
  2. Out[105]: <matplotlib.axes._subplots.AxesSubplot at 0x7f210fc68518>
  3. In [106]: s.expanding().mean().plot(style='k')
  4. Out[106]: <matplotlib.axes._subplots.AxesSubplot at 0x7f210fc68518>

扩展窗口示例