考虑Pandas Dataframe df1
:
df1 = pd.DataFrame({"Name":["Kevin","Peter","James","Jose","Matthew","Pattrick","Alexander"],"Number":[1,2,3,4,5,6,7],"Total":[495.2,432.5,'-',395.5,485.8,415,418.7],"Average_old":[86.57,83.97,'-',96.59,84.67,83.10,83.84],"Grade_old":['A','A','A','A+','A','A','A'],"Total_old":[432.8,419.8,'-',482.9,423.3,415,418.7]})
我使用以下公式计算了Average
和Grade
df1["Average"] = df1["Total"].apply(lambda x: x/5 + 0.1 if x != "-" else "-")
df1["Grade"] = df1["Average"].apply((lambda x:'A+' if x!='-' and x>90 else 'A'))
因此df1变为
DF1
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A+
1 Peter 2 432.5 83.97 A 419.8 86.60 A
2 James 3 - - A - - A
3 Jose 4 395.5 96.59 A+ 482.9 79.20 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A+
5 Pattrick 6 415.0 83.10 A 415.0 83.10 A
6 Alexander 7 418.7 83.84 A 418.7 83.84 A
df1
具有Total, Total_old, Grade, Grade_old, Average, Average_old
。我正在尝试检查Total的任何值是否相对于Total_old
被修改,Grade
的任何值相对于Grade_old
被修改,或者Average
的任何值相对于Average_old
被修改。我尝试使用以下代码创建一个新的Dataframe
dfmod
,它将给出df1的所有修改值
dfmod = pd.DataFrame()
columns =["Total","Average","Grade"]
for col in columns:
dfmod = pd.concat([dfmod,df1[["Name","Number",col + '_old']][df1[col].ne(df1[col +'_old'])].dropna()],sort=False)
dfmod.rename(columns={col + '_old':col},inplace=True)
dfmod = dfmod.groupby('Name',as_index = False,sort = False).first()
得到的输出为dfmod
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 None
2 Jose 4 482.9 96.59 A+
3 Matthew 5 423.3 84.67 A
4 Alexander 7 NaN 83.84 None
此处,在比较Total与Total_old、Average与Average_old以及Grade与Grade_old时,未修改Pattrick的值,因此正确删除了Pattrick的条目。
但是如果你观察Alexander's
Average
即使Total
,Average
和Grade
分别和Total_old,Average_old,Grade_old
相同修改值 Dataframe dfmod
具有作为修改值错误添加的Average
值。发生这种情况的原因是因为浮点运算在编程语言中的工作方式与整数运算不同,如下面的链接所述。https://www.geeksforgeeks.org/floating-point-error-in-python/
所以我尝试将np.isclose
函数实现为:
for col in columns:
if col is 'Grade':
dfmod = pd.concat([dfmod,df1[["Name","Number",col + '_old']][df1[col].ne(df1[col +'_old'])].dropna()],sort=False)
continue
dfmod = pd.concat([dfmod,df1[["Name","Number",col + '_old']][~np.isclose(df1[col],df1[col+'_old'])].dropna()],sort=False)
但是它抛出错误消息
`Exception has occurred: TypeError ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''`
错误似乎是因为数据中的"-"字符,我如何才能修复这个问题,请做帮助,我在这个问题上卡住了一段时间,并尝试了所有的资源,我可以得到。
预期产出:
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 A
3 Jose 4 482.9 96.59 A+
4 Matthew 5 423.3 84.67 A
它应忽略James、Pattrick和Alexander的值,因为它们在Total-Total_old、Average-Average_old、Grade-Grade_old方面没有任何变化
2条答案
按热度按时间vfh0ocws1#
据我所知,
"-"
字符是不必要的--你可以在它们出现的列中用None
替换它,然后把这些列变成数字,这将使你的预处理步骤更加清晰,并避免不必要的条件语句,我们需要检查某些值是否是"-"
。这样,在创建新列
Average
和Grade
时,就不必使用.apply
或检查特定元素是否为"-"
:然后,您可以在条件中使用
np.isclose
,删除任何包含null的行,选择包含"_old"
的列,并重命名这些列:结果:
ux6nzvsh2#
如果这是你要找的,请看一下。
输出:
将我们计算的平均值四舍五入到2位小数是这里的关键。
此外,在计算Grade时,如果遇到非数字值(如"-"),则返回Grade_old。