scipy 如何在python中编写while循环函数进行winsorizing

vmdwslir  于 2024-01-09  发布在  Python
关注(0)|答案(1)|浏览(232)

我有以下功能:

  1. from scipy.stats.mstats import winsorize
  2. import pandas as pd
  3. # winsorize function
  4. def winsor_try1(var, lower, upper):
  5. var = winsorize(var,limits=[lower,upper])
  6. '''
  7. Outliers Calculation using IQR
  8. '''
  9. q1, q3= np.percentile(var, [25, 75]) # q1,q3 calc
  10. iqr = q3 - q1 # iqr calc
  11. lower_bound = round(q1 - (1.5 * iqr),3) # lower bound
  12. upper_bound = round(q3 + (1.5 * iqr),3) # upper bound
  13. outliers = [x for x in var if x < lower_bound or x > upper_bound]
  14. print('These would be the outliers:', set(outliers),'\n',
  15. 'Total:', len(outliers),'.Upper bound & Lower bound:', lower_bound,'&',upper_bound)
  16. # the variable
  17. df = pd.DataFrame({
  18. 'age': [1,1,2,5,5,2,5,4,8,2,5,1,41,2,1,4,4,1,1,4,1,2,15,21,5,1,8,22,1,5,2,5,256,5,6,2,2,8,452]})

字符串
我想写一个while loop函数,我想在变量df['age']上应用函数winsor_try1,从lower = .01upper = .01开始,直到len(outliers) = 0.
我的理由是:只要len(outliers) > 0,我想重复这个函数,直到我能找到极限,直到age分布中的离群值变为0。
期望的输出应该是这样的:

  1. print('At limit =', i, 'there is no more outliers presented in the age variable.')


i =极限,其中len(outliers) = 0

brvekthn

brvekthn1#

您可以将其视为标量根查找问题并使用scipy.optimize.root_scalar,而不是自己编写while循环。

  1. import numpy as np
  2. from scipy.stats.mstats import winsorize
  3. from scipy.optimize import root_scalar
  4. # winsorize function
  5. def winsor_try1(var, lower, upper):
  6. '''
  7. Compute the number of IQR outliers
  8. '''
  9. var = winsorize(var,limits=[lower,upper])
  10. q1, q3= np.percentile(var, [25, 75]) # q1,q3 calc
  11. iqr = q3 - q1 # iqr calc
  12. lower_bound = round(q1 - (1.5 * iqr),3) # lower bound
  13. upper_bound = round(q3 + (1.5 * iqr),3) # upper bound
  14. outliers = [x for x in var if x < lower_bound or x > upper_bound]
  15. return len(outliers)
  16. # the variable
  17. var = np.asarray([1,1,2,5,5,2,5,4,8,2,5,1,41,2,1,4,4,1,1,4,1,2,15,21,5,1,8,22,1,5,2,5,256,5,6,2,2,8,452])
  18. def fun(i):
  19. # try to find `i` at which there is half an outlier
  20. # it doesn't exist, but this should get closer to the transition
  21. return winsor_try1(var, i, i) - 0.5
  22. # root_scalar tries to find the argument `i` that makes `fun` return zero
  23. res = root_scalar(fun, bracket=(0, 0.5))
  24. eps = 1e-6
  25. print(winsor_try1(var, res.root + eps, res.root + eps)) # 0
  26. print(winsor_try1(var, res.root - eps, res.root - eps)) # 6
  27. res.root # 0.15384615384656308

字符串
可能有更好的方法来解决这个问题,但我试图用类似于编写while循环的方式来回答这个问题。如果你想知道while循环是如何工作的,有很多关于bisection method或其他标量寻根算法的参考资料。

展开查看全部

相关问题