带有多个运算符OR语句的Pandas df

8ehkhllq  于 2023-04-04  发布在  其他
关注(0)|答案(2)|浏览(108)

我有一个函数,其中一个传递一个Pandasdf,它将返回布尔值1或0的基础上,如果一些条件得到满足。
你可以使用Python内置的运算符来使用多个OR语句吗?例如,我需要查看dataframe行***中是否满足***3个条件,但operator只能接受2个变量。我用Pytest测试了这个函数,这不起作用。感谢您的任何建议或伪代码。

import operator
import pandas as pd

def fault_finder(df):
    df['flag'] = operator.or_( # <-- 1st operator statement
        (df['temp1'] >= df['temp2'])

        # verify operating state 2
        & (df['free_clg_signal'] > .2)
        & (df['mechanical_clg_signal'] < .1),  # OR

        operator.or_( # <-- 2nd operator statement
        # verify operating state 3
        (df['temp1'] >= df['temp2'])
        & (df['mechanical_clg_signal'] > .01)
        & (df['free_clg_signal'] == .2),  # OR

        # verify operating state 4
        (df['temp1'] >= df['temp2'])
        & (df['mechanical_clg_signal'] > .01)
        & (df['free_clg_signal'] > .9)
        )
    ).astype(int)

    return df
sbtkgmzw

sbtkgmzw1#

您可以使用numpy.logical_or.reduce

def fault_finder(df):
    df['flag'] = np.logical_or.reduce([
        (df['temp1'] >= df['temp2'])

        # verify operating state 2
        & (df['free_clg_signal'] > .2)
        & (df['mechanical_clg_signal'] < .1),

        # verify operating state 3
        (df['temp1'] >= df['temp2'])
        & (df['mechanical_clg_signal'] > .01)
        & (df['free_clg_signal'] == .2),

        # verify operating state 4
        (df['temp1'] >= df['temp2'])
        & (df['mechanical_clg_signal'] > .01)
        & (df['free_clg_signal'] > .9)
        
    ]).astype(int)

    return df

或者|和括号:

def fault_finder(df):
    df['flag'] = ((
        (df['temp1'] >= df['temp2'])

        # verify operating state 2
        & (df['free_clg_signal'] > .2)
        & (df['mechanical_clg_signal'] < .1))
    
       |

       ( # verify operating state 3
        (df['temp1'] >= df['temp2'])
        & (df['mechanical_clg_signal'] > .01)
        & (df['free_clg_signal'] == .2))

       |
        
        (# verify operating state 4
        (df['temp1'] >= df['temp2'])
        & (df['mechanical_clg_signal'] > .01)
        & (df['free_clg_signal'] > .9))
        
       ).astype(int)

    return df

示例:

fault_finder(df)

   temp1  temp2  free_clg_signal  mechanical_clg_signal  flag
0      1      0              0.3                   0.05     1
1      2      4              0.2                   1.00     0
2      3      0              1.0                   1.00     1
ua4mk5z4

ua4mk5z42#

不幸的是,operator.or_只能接受两个参数。所以如果你有两个以上的布尔掩码,你就需要嵌套调用operator.or_。但是为什么不使用|呢?

import pandas as pd

mask1 = pd.Series([True, False, True, True, False])
mask2 = pd.Series([False, False, False, True, True])
mask3 = pd.Series([True, False, False, True, True])

out = operator.or_(
    operator.or_(mask1, mask2),
    mask3
)

# Note that you can't do operator.or_(mask1, mask2, mask3)
# because operator.or_ only takes 2 arguments

这相当于不那么详细的:

out = mask1 | mask2 | mask3

您也可以使用函数式方法;比如:

import functools
functools.reduce(operator.or_, [mask1, mask2, mask3])

如果布尔序列在一个 Dataframe 中,你也可以使用pd.DataFrame.any和一个轴参数。

df = pd.concat([mask1, mask2, mask3], axis=1)
out = df.any(axis=1)

numpy.any

import numpy as np
np.any([mask1, mask2, mask3], axis=0)

但numpy方法将返回一个数组,而不是pd.Series

相关问题