python Pandas中布尔索引的逻辑运算符

cyej8jka 于 2022-10-30 发布在 Python

关注(0)|答案(4)|浏览(197)

我正在Pandas中使用布尔索引。
问题是为什么声明：

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

退出时出错？
示例：

a = pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

python

来源：https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas

4条答案

按热度按时间

2nc8po8w1#

当你说

(a['x']==1) and (a['y']==10)

你隐含地要求Python将(a['x']==1)和(a['y']==10)转换为布尔值。
NumPy数组（长度大于1）和Pandas对象（如Series）没有布尔值--换句话说，它们引发
ValueError：数组的真值不明确。请使用.empty、.any（）或.all（）。
当作为布尔值使用时。这是因为它不清楚什么时候应该是True或False。一些用户可能会假设它们是True，如果它们有非零长度，比如Python列表。其他人可能希望它只有在所有元素为True时才是True。其他人可能希望它只有在任何元素为True时才是True。
因为有太多相互冲突的期望，NumPy和Pandas的设计者拒绝猜测，而是引发ValueError。
相反，您必须显式地调用empty()、all()或any()方法来指示所需的行为。
然而，在这个例子中，看起来你不需要布尔求值，你需要的是元素形式的逻辑与。这就是&二元运算符的作用：

(a['x']==1) & (a['y']==10)

返回布尔数组。
顺便说一下，正如alexpmil所指出的，括号是强制性的，因为&的运算符优先级高于==。
如果没有括号，a['x']==1 & a['y']==10将被计算为a['x'] == (1 & a['y']) == 10，而a['x'] == (1 & a['y']) == 10又等价于链式比较(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10)。这是一个Series and Series形式的表达式。在两个Series中使用and将再次触发与上面相同的ValueError。这就是必须使用括号的原因。

赞(0）回复(0）举报 2022-10-30

qnzebej02#

TLDR; Pandas中的逻辑运算符为`&`、`|`和`~`，括号`(...)`非常重要！

Python的and、or和not逻辑运算符都是为标量设计的，所以Pandas必须做得更好，重写位运算符来实现这个功能的 * 向量化 *（元素化）版本。
因此，在python中的以下内容（exp1和exp2是求值为布尔结果的表达式）...

exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

...将转换为...

exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

给Pandas吃。
如果在执行逻辑运算的过程中你得到一个ValueError，那么你需要用括号进行分组：

(exp1) op (exp2)

例如，

(df['col1'] == x) & (df['col2'] == y)

如此等等。

布尔索引：一个常见的操作是通过逻辑条件计算布尔掩码来过滤数据。Pandas提供了三个运算符：&表示逻辑与，|表示逻辑或，~表示逻辑非。

请考虑以下设置：

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

逻辑与

对于上面的df，假设您希望返回A〈5且B〉5的所有行，这是通过分别计算每个条件的掩码并对它们进行AND运算来完成的。

多载比特&运算子

在继续之前，请注意文档中的这一特定摘录，其中说明
另一个常用的运算是使用布尔向量来过滤数据。运算符包括：|表示or，&表示and，~表示not。这些必须用括号分组，因为默认情况下Python会将df.A > 2 & df.B < 3这样的表达式作为df.A > (2 & df.B) < 3求值，而期望的求值顺序是(df.A > 2) & (df.B < 3)。
因此，考虑到这一点，可以使用按位运算符&来实现按元素逻辑AND：

df['A'] < 5

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'] > 5

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

(df['A'] < 5) & (df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

并且后续的过滤步骤简单，

df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

括号用于覆盖位运算符的默认优先级顺序，位运算符的优先级高于条件运算符<和>。请参阅python文档中的运算符优先级一节。
如果不使用括号，则表达式的计算结果将不正确。例如，如果您不小心尝试了以下操作：

df['A'] < 5 & df['B'] > 5

它被解析为

df['A'] < (5 & df['B']) > 5

变成了，

df['A'] < something_you_dont_want > 5

它变成了（请参阅python文档中的链接运算符比较），

(df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

变成了，


# Both operands are Series...

something_else_you_dont_want1 and something_else_you_dont_want2

它会抛出

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所以，不要犯这样的错误！1

避免括号分组

修复实际上非常简单。大多数运算符都有相应的DataFrame绑定方法。如果使用函数而不是条件运算符构建单个掩码，则不再需要通过括号分组来指定求值顺序：

df['A'].lt(5)

0     True
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'].gt(5)

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

请参阅“灵活比较”一节。总之，我们

╒════╤════════════╤════════════╕
│    │ Operator   │ Function   │
╞════╪════════════╪════════════╡
│  0 │ >          │ gt         │
├────┼────────────┼────────────┤
│  1 │ >=         │ ge         │
├────┼────────────┼────────────┤
│  2 │ <          │ lt         │
├────┼────────────┼────────────┤
│  3 │ <=         │ le         │
├────┼────────────┼────────────┤
│  4 │ ==         │ eq         │
├────┼────────────┼────────────┤
│  5 │ !=         │ ne         │
╘════╧════════════╧════════════╛

另一个避免使用括号的方法是使用DataFrame.query（或eval）：

df.query('A < 5 and B > 5')

   A  B  C
1  3  7  9
3  4  7  6

型
我已经在Dynamic Expression Evaluation in pandas using pd.eval()中 * 广泛地 * 记录了query和eval。

operator.and_

允许您以函数方式执行此操作。在内部调用与位运算符对应的Series.__and__。

import operator 

operator.and_(df['A'] < 5, df['B'] > 5)

# Same as,

# (df['A'] < 5).__and__(df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

您通常不需要此信息，但了解此信息会很有用。

概括：np.logical_and（以及logical_and.reduce）

另一种方法是使用np.logical_and，它也不需要括号分组：

np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

np.logical_and是一个ufunc (Universal Functions)，并且大多数ufunc都有一个reduce方法。这意味着，如果有多个掩码要进行AND运算，那么用logical_and进行泛化会更容易。例如，要对掩码m1和m2以及m3与&进行AND运算，您必须执行以下操作

m1 & m2 & m3

存储器
但是，更简单的选择是

np.logical_and.reduce([m1, m2, m3])

这是非常强大的，因为它允许您在此基础上使用更复杂的逻辑进行构建（例如，在列表解析中动态生成掩码并添加所有掩码）：

import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m 

# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

1 -我知道我在这一点上喋喋不休，但请原谅我。这是一个 * 非常，非常 * 常见的初学者错误，必须非常彻底地解释。

逻辑或

对于上面的df，假设您希望返回A == 3或B == 7的所有行。

按位重载|

df['A'] == 3

0    False
1     True
2     True
3    False
4    False
Name: A, dtype: bool

df['B'] == 7

0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

(df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

如果您还没有，请同时阅读上面关于逻辑AND的部分，所有注意事项都适用于此处。
或者，可以使用

df[df['A'].eq(3) | df['B'].eq(7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

operator.or_

在引擎盖下调用Series.__or__。

operator.or_(df['A'] == 3, df['B'] == 7)

# Same as,

# (df['A'] == 3).__or__(df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[operator.or_(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

一米四七三

对于两种情况，请使用logical_or：

np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

型
对于多个遮罩，请使用logical_or.reduce：

np.logical_or.reduce([df['A'] == 3, df['B'] == 7])

# array([False,  True,  True,  True, False])

df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

型

逻辑非

给定一个掩码，例如

mask = pd.Series([True, True, False])

如果需要反转每个布尔值（以便最终结果为[False, False, True]），则可以使用下面的任何方法。

按位~

~mask

0    False
1    False
2     True
dtype: bool

显示器
同样，表达式需要加括号。

~(df['A'] == 3)

0     True
1    False
2    False
3     True
4     True
Name: A, dtype: bool

这将在内部调用

mask.__invert__()

0    False
1    False
2     True
dtype: bool

但不要直接使用它。

operator.inv

在Series上内部调用__invert__。

operator.inv(mask)

0    False
1    False
2     True
dtype: bool

型

一个月54个月一个月

这是 numpy 变种。

np.logical_not(mask)

0    False
1    False
2     True
dtype: bool

请注意，np.logical_and可以替代np.bitwise_and，logical_or可以替代为bitwise_or，logical_not可以替代为invert。

赞(0）回复(0）举报 2022-10-30

ioekq8ef3#

Pandas中布尔索引的逻辑运算符
必须认识到，不能使用任何Python * 逻辑运算符 *（and，or或not）上的数据（同样，您不能在具有多个元素的numpy.array上使用它们）。您不能使用这些数据结构的原因是，它们在其操作数上隐式调用bool，这会引发异常，因为这些数据结构决定布尔值数组的属性不明确：

>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我确实更广泛地讨论了in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A。

NumPy的逻辑函数

然而，NumPy提供了这些运算符的元素级操作等价物，作为可以在numpy.array、pandas.Series、pandas.DataFrame或任何其他（符合的）numpy.array子类上使用的函数：

and具有np.logical_and
or具有np.logical_or
not具有np.logical_not
numpy.logical_xor没有Python等效项，但它是一个逻辑"exclusive or"操作

因此，本质上，应该使用（假设df1和df2是Pandas Dataframe ）：

np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)

布尔值的位函数和位运算符

然而，如果你有布尔型NumPy数组、Pandas Series或Pandas DataFrames，你也可以使用元素形式的按位函数（对于布尔型，它们与逻辑函数是--或者至少应该是--无法区分的）：

按位与：np.bitwise_and或&运算子
按位或：np.bitwise_or或|运算符
按位非：np.invert（或别名np.bitwise_not）或~运算子
按位异或：np.bitwise_xor或^运算子

通常使用运算符。但是，当与比较运算符组合时，必须记住将比较结果括在括号中，因为按位运算符的优先级高于比较运算符：

(df1 < 10) | (df2 > 10)  # instead of the wrong df1 < 10 | df2 > 10

这可能是令人恼火的，因为Python逻辑运算符的优先级低于比较运算符，所以通常写a < 10 and b > 10（其中a和b是简单整数），不需要括号。

逻辑运算和比特运算之间的差异（在非布林值上）

需要强调的是，位运算和逻辑运算只在布尔NumPy数组（以及布尔Series和DataFrames）中是等价的。如果这些数组不包含布尔值，那么运算的结果就会不同。我将包括使用NumPy数组的示例，但Pandas数据结构的结果是相似的：

>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])

>>> np.logical_and(a1, a2)
array([False, False, False,  True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)

由于NumPy（和类似的Pandas）对布尔（布尔或“掩码”索引数组）和整数（索引数组）索引做了不同的事情，索引的结果也会不同：

>>> a3 = np.array([1, 2, 3, 4])

>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])

汇总表

Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
       and       |  np.logical_and        | np.bitwise_and         |        &
-------------------------------------------------------------------------------------
       or        |  np.logical_or         | np.bitwise_or          |        |
-------------------------------------------------------------------------------------
                 |  np.logical_xor        | np.bitwise_xor         |        ^
-------------------------------------------------------------------------------------
       not       |  np.logical_not        | np.invert              |        ~

其中，逻辑运算符不适用于NumPy数组、Pandas Series和pandas DataFrame。其他运算符适用于这些数据结构（和普通Python对象）并按元素工作。但是，在纯Python bool s上使用按位反转时要小心，因为在这种上下文中bool将被解释为整数（例如，~False返回-1，~True返回-2）。

赞(0）回复(0）举报 2022-10-30

3bygqnnd4#

请注意，您也可以使用*来执行and：

In [12]: np.all([a > 20, a < 40], axis=0)
   Out[12]:
   array([[False,  True, False, False,  True],
          [False, False, False, False, False],
          [ True,  True, False, False, False],
          [False,  True, False, False, False],
          [False,  True, False, False, False]])

   In [13]: (a > 20) * (a < 40)
   Out[13]:
   array([[False,  True, False, False,  True],
          [False, False, False, False, False],
          [ True,  True, False, False, False],
          [False,  True, False, False, False],
          [False,  True, False, False, False]])

我并不是说这比使用np.all或|更好，但它确实有效。

赞(0）回复(0）举报 2022-10-30

我来回答

python Pandas中布尔索引的逻辑运算符

4条答案

TLDR; Pandas中的逻辑运算符为`&`、`|`和`~`，括号`(...)`非常重要！

逻辑与

逻辑或

逻辑非

NumPy的逻辑函数

布尔值的位函数和位运算符

逻辑运算和比特运算之间的差异（在非布林值上）

汇总表

相关问题

热门标签

最新问答

python Pandas中布尔索引的逻辑运算符

4条答案

TLDR; Pandas中的逻辑运算符为&、|和~，括号(...)非常重要！

逻辑与

逻辑或

逻辑非

NumPy的逻辑函数

布尔值的位函数和位运算符

逻辑运算和比特运算之间的差异（在非布林值上）

汇总表

相关问题

热门标签

最新问答

TLDR; Pandas中的逻辑运算符为`&`、`|`和`~`，括号`(...)`非常重要！