pandas for循环删除列并插补高于和低于20%的值[已关闭]

8iwquhpp  于 2022-12-21  发布在  其他
关注(0)|答案(1)|浏览(100)

已关闭。此问题需要details or clarity。当前不接受答案。
**想要改进此问题?**添加详细信息并通过editing this post阐明问题。

昨天关门了。
Improve this question
如何创建一个for循环来删除包含20%以上空值的列,并在Pandas数据框中用平均值插补其余列,而所有要素都包含浮点数和Nan(绝对空值)空值
我无法创建一个具有如此多功能性的for循环,

mspsb9vt

mspsb9vt1#

您可以使用:

# identify the columns with > 20% NaN
m = df.isna().mean().gt(0.2)

# replace the NaN by the mean
df.loc[:, m] = df.loc[:, m].fillna(df.loc[:, m].mean())

输入示例:

np.random.seed(0)
df = pd.DataFrame(np.random.choice([0, 1, 2, 3, np.nan], size=(10, 10)))

     0    1    2    3    4    5    6    7    8    9
0  NaN  0.0  3.0  3.0  3.0  1.0  3.0  2.0  NaN  0.0
1  0.0  NaN  2.0  1.0  0.0  1.0  1.0  0.0  1.0  NaN
2  3.0  0.0  3.0  0.0  2.0  3.0  0.0  1.0  3.0  3.0
3  3.0  0.0  1.0  1.0  1.0  0.0  2.0  NaN  3.0  3.0
4  2.0  NaN  2.0  0.0  0.0  NaN  0.0  NaN  1.0  NaN
5  1.0  2.0  2.0  0.0  1.0  1.0  1.0  1.0  3.0  3.0
6  2.0  3.0  0.0  3.0  NaN  1.0  2.0  NaN  3.0  NaN
7  NaN  NaN  3.0  NaN  NaN  NaN  0.0  NaN  3.0  2.0
8  0.0  1.0  1.0  3.0  0.0  0.0  1.0  2.0  NaN  2.0
9  0.0  3.0  2.0  2.0  0.0  1.0  0.0  2.0  2.0  3.0

输出:

0         1    2    3    4    5    6         7    8         9
0  NaN  0.000000  3.0  3.0  3.0  1.0  3.0  2.000000  NaN  0.000000
1  0.0  1.285714  2.0  1.0  0.0  1.0  1.0  0.000000  1.0  2.285714
2  3.0  0.000000  3.0  0.0  2.0  3.0  0.0  1.000000  3.0  3.000000
3  3.0  0.000000  1.0  1.0  1.0  0.0  2.0  1.333333  3.0  3.000000
4  2.0  1.285714  2.0  0.0  0.0  NaN  0.0  1.333333  1.0  2.285714
5  1.0  2.000000  2.0  0.0  1.0  1.0  1.0  1.000000  3.0  3.000000
6  2.0  3.000000  0.0  3.0  NaN  1.0  2.0  1.333333  3.0  2.285714
7  NaN  1.285714  3.0  NaN  NaN  NaN  0.0  1.333333  3.0  2.000000
8  0.0  1.000000  1.0  3.0  0.0  0.0  1.0  2.000000  NaN  2.000000
9  0.0  3.000000  2.0  2.0  0.0  1.0  0.0  2.000000  2.0  3.000000

相关问题