pandas 根据条件仅填充来自另一个 Dataframe 的唯一值

kx1ctssn  于 2023-01-24  发布在  其他
关注(0)|答案(2)|浏览(162)

如何从另一个 Dataframe (df2)的唯一值填充df1中的“0”值。预期输出在df1中不重复。
任何参考链接,谢谢你的帮助。

data1 = {'test' :['b',0,'e',0,0,'f']}
df1 = pd.DataFrame(data=data1)

data2 = {'test' :['a','b','c','d','e','f']}
df2 = pd.DataFrame(data=data2)

DF1

test
0   b  
1   0
2   e  
3   0  
4   0  
5   f

DF2

test
0   a
1   b
2   c
3   d
4   e
5   f

预期产出:

test
0   b  -- df1
1   a  -- fill with a from df2
2   e  -- df1
3   c  -- fill with c from df2
4   d  -- fill with d from df2
5   f  -- df1
brgchamk

brgchamk1#

假设df2中有足够的唯一值来填充df1中的0,提取这些唯一值,并使用布尔索引为它们赋值:

# which rows are 0?
m = df1['test'].eq(0)

# extract values of df2 that are not in df1
vals = df2.loc[~df2['test'].isin(df1['test']), 'test'].tolist()
# ['b', 'e', 'f']

# assign the values in the limit of the needed number
df1.loc[m, 'test'] = vals[:m.sum()]

print(df1)

输出:

test
0    b
1    a
2    e
3    c
4    d
5    f

如果df2中的值不总是足够的,并且您希望填充前几个可能的0:

m = df1['test'].eq(0)

vals = df2.loc[~df2['test'].isin(df1['test']), 'test'].unique()
# ['b', 'e', 'f']

m2 = m.cumsum().le(len(vals))

df1.loc[m&m2, 'test'] = vals[:m.sum()]

print(df1)
lf5gs5x2

lf5gs5x22#

解决方案假设:

  1. df2中“0”的数量==唯一值
    1.有一个类似“test”的列要操作
# get the unique values in df1
uni = df1['test'].unique()

# get the unique values in df2 which are not in df1
unique_df2 = df2[~df2['test'].isin(uni)]

# get the index of all the '0' in df1 in a list
index_df1_0 = df1.index[df1['test'] == 0].tolist()

# replace the '0' in df1 with unique values from df1 (assumption #1 imp!)
for val_ in range(len(index_df1_0)):
    df1.iloc[index_df1_0[val_]] = unique_df2.iloc[val_]

print(df1)

相关问题