where()方法和Masking
Selecting values from a Series with a boolean vector generally returns a subset of the data. To guarantee that selection output has the same shape as the original data, you can use the where
method in Series and DataFrame
.
To return only the selected rows:
In [177]: s[s > 0]
Out[177]:
3 1
2 2
1 3
0 4
dtype: int64
To return a Series of the same shape as the original:
In [178]: s.where(s > 0)
Out[178]:
4 NaN
3 1.0
2 2.0
1 3.0
0 4.0
dtype: float64
Selecting values from a DataFrame with a boolean criterion now also preserves input data shape. where
is used under the hood as the implementation. The code below is equivalent to df.where(df < 0)
.
In [179]: df[df < 0]
Out[179]:
A B C D
2000-01-01 -2.104139 -1.309525 NaN NaN
2000-01-02 -0.352480 NaN -1.192319 NaN
2000-01-03 -0.864883 NaN -0.227870 NaN
2000-01-04 NaN -1.222082 NaN -1.233203
2000-01-05 NaN -0.605656 -1.169184 NaN
2000-01-06 NaN -0.948458 NaN -0.684718
2000-01-07 -2.670153 -0.114722 NaN -0.048048
2000-01-08 NaN NaN -0.048788 -0.808838
In addition, where
takes an optional other
argument for replacement of values where the condition is False, in the returned copy.
In [180]: df.where(df < 0, -df)
Out[180]:
A B C D
2000-01-01 -2.104139 -1.309525 -0.485855 -0.245166
2000-01-02 -0.352480 -0.390389 -1.192319 -1.655824
2000-01-03 -0.864883 -0.299674 -0.227870 -0.281059
2000-01-04 -0.846958 -1.222082 -0.600705 -1.233203
2000-01-05 -0.669692 -0.605656 -1.169184 -0.342416
2000-01-06 -0.868584 -0.948458 -2.297780 -0.684718
2000-01-07 -2.670153 -0.114722 -0.168904 -0.048048
2000-01-08 -0.801196 -1.392071 -0.048788 -0.808838
You may wish to set values based on some boolean criteria. This can be done intuitively like so:
In [181]: s2 = s.copy()
In [182]: s2[s2 < 0] = 0
In [183]: s2
Out[183]:
4 0
3 1
2 2
1 3
0 4
dtype: int64
In [184]: df2 = df.copy()
In [185]: df2[df2 < 0] = 0
In [186]: df2
Out[186]:
A B C D
2000-01-01 0.000000 0.000000 0.485855 0.245166
2000-01-02 0.000000 0.390389 0.000000 1.655824
2000-01-03 0.000000 0.299674 0.000000 0.281059
2000-01-04 0.846958 0.000000 0.600705 0.000000
2000-01-05 0.669692 0.000000 0.000000 0.342416
2000-01-06 0.868584 0.000000 2.297780 0.000000
2000-01-07 0.000000 0.000000 0.168904 0.000000
2000-01-08 0.801196 1.392071 0.000000 0.000000
By default, where
returns a modified copy of the data. There is an optional parameter inplace
so that the original data can be modified without creating a copy:
In [187]: df_orig = df.copy()
In [188]: df_orig.where(df > 0, -df, inplace=True);
In [189]: df_orig
Out[189]:
A B C D
2000-01-01 2.104139 1.309525 0.485855 0.245166
2000-01-02 0.352480 0.390389 1.192319 1.655824
2000-01-03 0.864883 0.299674 0.227870 0.281059
2000-01-04 0.846958 1.222082 0.600705 1.233203
2000-01-05 0.669692 0.605656 1.169184 0.342416
2000-01-06 0.868584 0.948458 2.297780 0.684718
2000-01-07 2.670153 0.114722 0.168904 0.048048
2000-01-08 0.801196 1.392071 0.048788 0.808838
Note:The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2)
is equivalent to np.where(m, df1, df2)
.
In [190]: df.where(df < 0, -df) == np.where(df < 0, df, -df)
Out[190]:
A B C D
2000-01-01 True True True True
2000-01-02 True True True True
2000-01-03 True True True True
2000-01-04 True True True True
2000-01-05 True True True True
2000-01-06 True True True True
2000-01-07 True True True True
2000-01-08 True True True True
alignment
Furthermore, where
aligns the input boolean condition (ndarray or DataFrame), such that partial selection with setting is possible. This is analogous to partial setting via .loc
(but on the contents rather than the axis labels).
In [191]: df2 = df.copy()
In [192]: df2[ df2[1:4] > 0] = 3
In [193]: df2
Out[193]:
A B C D
2000-01-01 -2.104139 -1.309525 0.485855 0.245166
2000-01-02 -0.352480 3.000000 -1.192319 3.000000
2000-01-03 -0.864883 3.000000 -0.227870 3.000000
2000-01-04 3.000000 -1.222082 3.000000 -1.233203
2000-01-05 0.669692 -0.605656 -1.169184 0.342416
2000-01-06 0.868584 -0.948458 2.297780 -0.684718
2000-01-07 -2.670153 -0.114722 0.168904 -0.048048
2000-01-08 0.801196 1.392071 -0.048788 -0.808838
Where can also accept axis
and level
parameters to align the input when performing the where
.
In [194]: df2 = df.copy()
In [195]: df2.where(df2>0,df2['A'],axis='index')
Out[195]:
A B C D
2000-01-01 -2.104139 -2.104139 0.485855 0.245166
2000-01-02 -0.352480 0.390389 -0.352480 1.655824
2000-01-03 -0.864883 0.299674 -0.864883 0.281059
2000-01-04 0.846958 0.846958 0.600705 0.846958
2000-01-05 0.669692 0.669692 0.669692 0.342416
2000-01-06 0.868584 0.868584 2.297780 0.868584
2000-01-07 -2.670153 -2.670153 0.168904 -2.670153
2000-01-08 0.801196 1.392071 0.801196 0.801196
This is equivalent to (but faster than) the following.
In [196]: df2 = df.copy()
In [197]: df.apply(lambda x, y: x.where(x>0,y), y=df['A'])
Out[197]:
A B C D
2000-01-01 -2.104139 -2.104139 0.485855 0.245166
2000-01-02 -0.352480 0.390389 -0.352480 1.655824
2000-01-03 -0.864883 0.299674 -0.864883 0.281059
2000-01-04 0.846958 0.846958 0.600705 0.846958
2000-01-05 0.669692 0.669692 0.669692 0.342416
2000-01-06 0.868584 0.868584 2.297780 0.868584
2000-01-07 -2.670153 -2.670153 0.168904 -2.670153
2000-01-08 0.801196 1.392071 0.801196 0.801196
New in version 0.18.1.
Where can accept a callable as condition and other
arguments. The function must be with one argument (the calling Series or DataFrame) and that returns valid output as condition and other
argument.
In [198]: df3 = pd.DataFrame({'A': [1, 2, 3],
.....: 'B': [4, 5, 6],
.....: 'C': [7, 8, 9]})
.....:
In [199]: df3.where(lambda x: x > 4, lambda x: x + 10)
Out[199]:
A B C
0 11 14 7
1 12 5 8
2 13 6 9
Mask
mask() is the inverse boolean operation of where
.
In [200]: s.mask(s >= 0)
Out[200]:
4 NaN
3 NaN
2 NaN
1 NaN
0 NaN
dtype: float64
In [201]: df.mask(df >= 0)
Out[201]:
A B C D
2000-01-01 -2.104139 -1.309525 NaN NaN
2000-01-02 -0.352480 NaN -1.192319 NaN
2000-01-03 -0.864883 NaN -0.227870 NaN
2000-01-04 NaN -1.222082 NaN -1.233203
2000-01-05 NaN -0.605656 -1.169184 NaN
2000-01-06 NaN -0.948458 NaN -0.684718
2000-01-07 -2.670153 -0.114722 NaN -0.048048
2000-01-08 NaN NaN -0.048788 -0.808838