Pandas,按组向后填充

new9mtju  于 2023-06-04  发布在  其他
关注(0)|答案(1)|浏览(122)

我有这个问题与此数据框下面的代码

import numpy as np
import pandas as pd
from numpy import nan
tostk = np.asarray([['A', nan, 6.0, nan, nan],
       ['A', 3.0, nan, nan, nan],
       ['A', nan, nan, 9.0, nan],
       ['A', nan, 5.0, nan, nan],
       ['A', nan, nan, nan, 7.0],
       ['B', nan, 8.0, nan, 7.0],
       ['B', nan, nan, 6.0, nan],
       ['B', 6.0, nan, nan, 8.0],
       ['B', 5.0, nan, nan, 6.0],
       ['B', nan, nan, 4.0, nan]])
pd.DataFrame(tostk)

我需要用第一个值替换每个类别(A和B)的nan值。因此,我尝试了bfill,但“bfill”的问题是,如果值属于类别B,它将填充类别A中的值
预期结果

res = np.asarray([['A', 3.0, 6.0, 9.0, 7.0],
           ['A', 3.0, 5.0, 9.0, 7.0],
           ['A', nan, 5.0, 9.0, 7.0],
           ['A', nan, 5.0, nan, 7.0],
           ['A', nan, nan, nan, 7.0],
           ['B', 6.0, 8.0, 6.0, 7.0],
           ['B', 6.0, nan, 6.0, 8.0],
           ['B', 6.0, nan, 4.0, 8.0],
           ['B', 5.0, nan, 4.0, 6.0],
           ['B', nan, nan, 4.0, nan]])
    pd.DataFrame(res)

任何想法都欢迎

lb3vh1jj

lb3vh1jj1#

我发现链接的重复工作,有两个皱纹:
1.最后4列是对象类型,如果np.nan是对象类型,Pandas似乎不会将其检测为NA值。我不得不转换为浮动。
1.链接的解决方案删除了组标签,这显然不是您想要的。
我做了和你一样的设置代码:

import numpy as np
import pandas as pd
from numpy import nan
tostk = np.asarray([['A', nan, 6.0, nan, nan],
       ['A', 3.0, nan, nan, nan],
       ['A', nan, nan, 9.0, nan],
       ['A', nan, 5.0, nan, nan],
       ['A', nan, nan, nan, 7.0],
       ['B', nan, 8.0, nan, 7.0],
       ['B', nan, nan, 6.0, nan],
       ['B', 6.0, nan, nan, 8.0],
       ['B', 5.0, nan, nan, 6.0],
       ['B', nan, nan, 4.0, nan]])
df = pd.DataFrame(tostk)

然后转换为float:

df.loc[:, 1:4] = df.loc[:, 1:4].astype(float)

然后进行回填:

print(df.groupby(0).apply(lambda x: x.fillna(method='bfill')))

输出:

0    1    2    3    4
0  A  3.0  6.0  9.0  7.0
1  A  3.0  5.0  9.0  7.0
2  A  NaN  5.0  9.0  7.0
3  A  NaN  5.0  NaN  7.0
4  A  NaN  NaN  NaN  7.0
5  B  6.0  8.0  6.0  7.0
6  B  6.0  NaN  6.0  8.0
7  B  6.0  NaN  4.0  8.0
8  B  5.0  NaN  4.0  6.0
9  B  NaN  NaN  4.0  NaN

相关问题