如何使用Pandas将列中的空值替换为特定列中非空值的平均值

jfewjypa  于 2022-11-20  发布在  其他
关注(0)|答案(2)|浏览(239)

我所采用的数据集具有列country、coal_production_changepct、gasprodchangepct、year。在coal prod change pct和gas prod change pct中存在空值,我希望用非空值的coal prod change pct和gas prod change pct的平均值替换空值。 Dataframe 如下所示img。

[{"metadata":{"trusted":true},"cell_type":"code","source":"sample_df.loc[490:500,['country','coal_prod_change_pct','year','gas_prod_change_pct']]","execution_count":79,"outputs":[{"output_type":"execute_result","execution_count":79,"data":{"text/plain":"                  country  coal_prod_change_pct  year  gas_prod_change_pct\n490               Ukraine              2.737000  2018             1.463000\n491               Ukraine             -2.299000  2019            -0.481000\n492               Ukraine             -4.111211  2020             1.197368\n493  United Arab Emirates                   NaN  2001             2.553000\n494  United Arab Emirates                   NaN  2002            10.239000\n495  United Arab Emirates                   NaN  2003             3.227000\n496  United Arab Emirates                   NaN  2004             3.349000\n497  United Arab Emirates                   NaN  2005             3.240000\n498  United Arab Emirates                   NaN  2006             2.092000\n499  United Arab Emirates                   NaN  2007             3.074000\n500  United Arab Emirates                   NaN  2008            -0.099000","text/html":"\n\n\n  \n    \n      \n      \n      \n      \n      \n    \n  \n  \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n    \n      \n      \n      \n      \n      \n    \n  \ncountrycoal_prod_change_pctyeargas_prod_change_pct490Ukraine2.73700020181.463000491Ukraine-2.2990002019-0.481000492Ukraine-4.11121120201.197368493United Arab EmiratesNaN20012.553000494United Arab EmiratesNaN200210.239000495United Arab EmiratesNaN20033.227000496United Arab EmiratesNaN20043.349000497United Arab EmiratesNaN20053.240000498United Arab EmiratesNaN20062.092000499United Arab EmiratesNaN20073.074000500United Arab EmiratesNaN2008-0.099000\n"},"metadata":{}}]}]

country_grp = sample_df.groupby('country')

country_grp\['coal_prod_change_pct'\].fillna(country_grp\['coal_prod_change_pct'\].mean())

country_grp\['coal_prod_change_pct'\].apply(lambda x: x.fillna(x.mean()))

但在第二种方法中,当我们应用方法时,不存在inplace = true

ej83mcc0

ej83mcc01#

我们通常做transform

filler = country_grp['coal_prod_change_pct'].transform('mean')
sample_df['coal_prod_change_pct'].fillna(filler, inplace=True)
pzfprimi

pzfprimi2#

您可以使用fillna()方法,将数据行中的Null值取代为该数据行中非Null值的平均值。
例如:

import pandas as pd df = pd.DataFrame({'A':[1,2,3,4,5], 'B':[1,2,3,4,5]}) df['B'].fillna(df['B'].mean(), inplace=True)

相关问题