pandas 如何通过忽略panda Dataframe 中的np.nan值来检查具有一定容差的每行趋势

von4xj4u  于 2023-02-11  发布在  其他
关注(0)|答案(1)|浏览(126)

这个问题的扩展是如何通过忽略Pandas Dataframe 中的NaN值来检查一行是否单调递增
我有下面的 Dataframe

import pandas as pd
import numpy as np
d = {'Cell':['A','B','C','D','E'],'D1':[5, 2, 2, 6,6], 'D2':[np.nan, 5, 6, np.nan,3], 'D3':[7,np.nan, 5, 5,np.nan], 'D6':[17, 3, np.nan,np.nan,2]}
df = pd.DataFrame(d)

Cell  D1   D2   D3    D6
0    A   5  NaN  7.0  17.0
1    B   2  5.0  NaN   3.0
2    C   2  6.0  5.0   NaN
3    D   6  NaN  5.0   NaN
4    E   6  3.0  NaN   2.0

我想通过忽略np.nan值并添加一些容差来检查D6〉D3〉D2〉D1(仅检查连续对,(D1,D2),(D2,D3),(D3,D6)),并创建两个新列"is_increasing"(真或假)和"failing pairs"(失败对)

    • 示例**介于6和5之间--〉理想情况下,它不是单调递增的--〉我已将容差设置为30%--〉(6 - 5)/6 = 16.3%--〉5比小于30%的6小16.3%--〉我希望它为真

介于5,3--〉(5 - 3)/5 = 40%之间,大于30%--〉则为假

import pandas as pd
    import numpy as np
    d1 = {'Cell':['A','B','C','D','E'],'D1':[5, 2, 2, 6,6], 'D2':[np.nan, 5, 6, np.nan,3], 'D3':[7,np.nan, 5, 5,np.nan], 'D6':[17, 3, np.nan,np.nan,2],'is_increasing?':['True','False','True','True','False'],'failing pairs':['-','[D2,D6]','-','-','[[D1,D2],[D2,D6]]']}

df1 = pd.DataFrame(d1)
print(df1)

Cell  D1   D2   D3    D6 is_increasing?       failing pairs
0    A   5  NaN  7.0  17.0                          True                  -
1    B   2  5.0  NaN   3.0                         False            [D2,D6]
2    C   2  6.0  5.0   NaN                          True                  -
3    D   6  NaN  5.0   NaN                          True                  -
4    E   6  3.0  NaN   2.0                         False  [[D1,D2],[D2,D6]]

我想检查的列名是动态的,它们可以作为程序column_names =[D1,D2,D3,D6]中的列表提取

    • 新增栏目说明:**
**first row** --> 5,7,17 --> it is increasing 
**second row** --> (2,5)--> increasing, (5,3) --> 1-(3/5)=40% greater than 30% tolerance set --> false
**third row-**-> (2,6)--> increasing, (6,5)--> 1-(5/6)=16.3% less than 30% tolerance set --> true
**fourth row** -->(6,5)--> 1-(5/6)=16.3% less than 30% tolerance set --> true
**fifth row** -->(6,3) --> 1-(3/6)=50% greater than 30% tolerance set,(3,2)=1-(2/3)=33% greater than 30% --> False


   **More examples for the problem statement**

我想检查趋势是否随着容差设置而增加

    • 例如**
1,3,2,1,4,2,5 ---> i want to take all consecutive pairs and check whether the diff percentage is less than tolerance set 
(1,3) -->1-(3/1)=-200% < 30% --> passed 
(3,2)--> 1-(2/3)=33% > 30% --> failed 
(2,1) -->1-(1/2)=50% > 30% --> failed 
(1,4) --> 1-(4/1)=-400% < 30% --> passed 
(4,2) --> 1-(2/4) = 50% > 30% --> failed 
(2,5) --> 1-(5/2)=-150% < 30% --> passed

如果任何一对失败--〉增加的趋势是失败与容限,我需要使它与失败对的列表假

    • 示例,Nan介于两者之间**
2 5 NaN 3 -->(2,5),(5,3) are the pairs that i need to check 
    2 Nan Nan 3 --> (2,3) is the only pair to check 
    2 Nan Nan Nan -->no pairs to check 
    Nan Nan Nan Nan -->no need to check

我的 Dataframe 大小是500k到600k行,我尝试过使用for循环迭代每一行,这需要大量的时间。
请让我知道任何有效的解决办法,因为我是新手Pandas。

9q78igpj

9q78igpj1#

不完全是所描述的逻辑,因为我认为它是模棱两可的。
这里我考虑了一个失败的值任何值低于任何以前的:

df2 = (df.filter(like='D')
         .cummax(axis=1)
         .sub(df)#.gt(0)
       )

m = df2.gt(0)

df['is_increasing?'] = (
 m.sum(axis=1)
       .div(df2.notna().sum(axis=1))
    .le(0.34)
)

df['falling'] = (
    df2.where(m).stack().reset_index()
 .groupby('level_0')['level_1'].agg(list)
 )

print(df)

输出:

Cell  D1   D2   D3    D6  is_increasing?   falling
0    A   5  NaN  7.0  17.0            True       NaN
1    B   2  5.0  NaN   3.0            True      [D6]
2    C   2  6.0  5.0   NaN            True      [D3]
3    D   6  NaN  5.0   NaN           False      [D3]
4    E   6  3.0  NaN   2.0           False  [D2, D6]

相关问题