scipy Pandas Dataframe中不同列长度的问题

35g0bw71 于 2022-11-10 发布在其他

关注(0)|答案(1)|浏览(215)

我知道这可能是显而易见的如何解决它，但我的想法了...
我将一个带有Pandas的.csv文件导入到一个数据框中。数据的格式为：3列，带有单个标题，第1列：45行，第2列40行，第3列：21行。形状是（45，3）。“缺失”的行用NAN填充，这里开始我的问题。
我想用不同的scipy函数来评估一些统计数据，比如安德森Darling检验等，如下所示：

for i in columns:
print ([i])
a = stats.anderson(df[i], dist = 'norm')
print (a)
if a[0] > a[1][2]:
    print('The null hypothesis can be rejected at', a[2][2],'% significance level')
else:
    print('The null hypothesis cannot be rejected')

因此，第一列的计算结果很好：

['Z79V0001']AndersonResult(statistic=0.41768739435435975, critical_values=array([0.535, 0.609, 0.731, 0.853, 1.014]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))The null hypothesis cannot be rejected

但对于其他人我得到的是

['Z79V0003_1']AndersonResult(statistic=nan, critical_values=array([0.535, 0.609, 0.731, 0.853, 1.014]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ]))

不能拒绝零假设用零填充NAN值没有帮助，因为这样统计数据将以错误的方式计算。我只是无法解决如何调整列的长度，以便函数只在找到数字的行上工作，如果到达NAN，则继续下一列......非常感谢帮助。

scipy

来源：https://stackoverflow.com/questions/73251478/problem-with-different-column-lengths-in-pandas-dataframe

1条答案

按热度按时间

ef1yzkbh1#

如果将numpy数组传递给stats函数，这将是最简单的。

for col in df.columns:
    a = stats.anderson(df[col].dropna().values, dist = 'norm')

赞(0）回复(0）举报 2022-11-10

我来回答

scipy Pandas Dataframe中不同列长度的问题

1条答案

相关问题

热门标签

最新问答