这个问题的扩展是如何通过忽略Pandas Dataframe 中的NaN值来检查一行是否单调递增
我有下面的 Dataframe
import pandas as pd
import numpy as np
d = {'Cell':['A','B','C','D','E'],'D1':[5, 2, 2, 6,6], 'D2':[np.nan, 5, 6, np.nan,3], 'D3':[7,np.nan, 5, 5,np.nan], 'D6':[17, 3, np.nan,np.nan,2]}
df = pd.DataFrame(d)
Cell D1 D2 D3 D6
0 A 5 NaN 7.0 17.0
1 B 2 5.0 NaN 3.0
2 C 2 6.0 5.0 NaN
3 D 6 NaN 5.0 NaN
4 E 6 3.0 NaN 2.0
我想通过忽略np.nan值并添加一些容差来检查D6〉D3〉D2〉D1(仅检查连续对,(D1,D2),(D2,D3),(D3,D6)),并创建两个新列"is_increasing"(真或假)和"failing pairs"(失败对)
- 示例**介于6和5之间--〉理想情况下,它不是单调递增的--〉我已将容差设置为30%--〉(6 - 5)/6 = 16.3%--〉5比小于30%的6小16.3%--〉我希望它为真
介于5,3--〉(5 - 3)/5 = 40%之间,大于30%--〉则为假
import pandas as pd
import numpy as np
d1 = {'Cell':['A','B','C','D','E'],'D1':[5, 2, 2, 6,6], 'D2':[np.nan, 5, 6, np.nan,3], 'D3':[7,np.nan, 5, 5,np.nan], 'D6':[17, 3, np.nan,np.nan,2],'is_increasing?':['True','False','True','True','False'],'failing pairs':['-','[D2,D6]','-','-','[[D1,D2],[D2,D6]]']}
df1 = pd.DataFrame(d1)
print(df1)
Cell D1 D2 D3 D6 is_increasing? failing pairs
0 A 5 NaN 7.0 17.0 True -
1 B 2 5.0 NaN 3.0 False [D2,D6]
2 C 2 6.0 5.0 NaN True -
3 D 6 NaN 5.0 NaN True -
4 E 6 3.0 NaN 2.0 False [[D1,D2],[D2,D6]]
我想检查的列名是动态的,它们可以作为程序column_names =[D1,D2,D3,D6]中的列表提取
- 新增栏目说明:**
**first row** --> 5,7,17 --> it is increasing
**second row** --> (2,5)--> increasing, (5,3) --> 1-(3/5)=40% greater than 30% tolerance set --> false
**third row-**-> (2,6)--> increasing, (6,5)--> 1-(5/6)=16.3% less than 30% tolerance set --> true
**fourth row** -->(6,5)--> 1-(5/6)=16.3% less than 30% tolerance set --> true
**fifth row** -->(6,3) --> 1-(3/6)=50% greater than 30% tolerance set,(3,2)=1-(2/3)=33% greater than 30% --> False
**More examples for the problem statement**
我想检查趋势是否随着容差设置而增加
- 例如**
1,3,2,1,4,2,5 ---> i want to take all consecutive pairs and check whether the diff percentage is less than tolerance set
(1,3) -->1-(3/1)=-200% < 30% --> passed
(3,2)--> 1-(2/3)=33% > 30% --> failed
(2,1) -->1-(1/2)=50% > 30% --> failed
(1,4) --> 1-(4/1)=-400% < 30% --> passed
(4,2) --> 1-(2/4) = 50% > 30% --> failed
(2,5) --> 1-(5/2)=-150% < 30% --> passed
如果任何一对失败--〉增加的趋势是失败与容限,我需要使它与失败对的列表假
- 示例,Nan介于两者之间**
2 5 NaN 3 -->(2,5),(5,3) are the pairs that i need to check
2 Nan Nan 3 --> (2,3) is the only pair to check
2 Nan Nan Nan -->no pairs to check
Nan Nan Nan Nan -->no need to check
我的 Dataframe 大小是500k到600k行,我尝试过使用for循环迭代每一行,这需要大量的时间。
请让我知道任何有效的解决办法,因为我是新手Pandas。
1条答案
按热度按时间9q78igpj1#
不完全是所描述的逻辑,因为我认为它是模棱两可的。
这里我考虑了一个失败的值任何值低于任何以前的:
输出: