pandas 如何在给定的数据段内随机采样一个值?

iszxjhcz  于 2023-02-11  发布在  其他
关注(0)|答案(2)|浏览(131)

我想创建一个新列“sample_group_B”,它从A组的同一段中的B组中随机抽取一个购买价格值。如何在Pandas中执行此操作?

segment | purchase price | group
High    | 100            | A
High    | 105            | A
High    | 103            | B
High    | 104            | B
Low     | 10             | A
Low     | 9              | B
Low     | 50             | B
Low     | 55             | B

我想创建一个新列,对相应细分市场中B组的购买价格进行随机抽样,例如:

segment | purchase price | group | sample_group_B
High    | 100            | A     | sample a value from (103 or 104)
High    | 105            | A     | sample a value from (103 or 104)
Low     | 10             | A     | sample a value from (9 or 50 or 55)

我尝试了np.random(),但它返回了一堆Nans。

bpsygsoo

bpsygsoo1#

注解代码
from random import choice

# filter the A, B groups
A = df.query("group == 'A'")
B = df.query("group == 'B'")

# Create a mapping dictionary to list 
# all purchase price for a given segment
d = B.groupby('segment')['purchase price'].agg(list)

# Map the segments in A with a choice from mapping dict
A['sample_B'] = A['segment'].map(lambda s: choice(d[s]))
结果
segment  purchase price group  sample_B
0    High             100     A       103
1    High             105     A       104
4     Low              10     A         9
p3rjfoxz

p3rjfoxz2#

1.分成两个df
1.自连接
1.分组抽样

代码:

# prepare sample data
d = [["High", 100, "A"]
,["High", 105, "A"]
,["High", 103, "B"]
,["High", 104, "B"]
,["Low",  10, "A"]
,["Low",  9, "B"]
,["Low",  50, "B"]
,["Low",  55, "B"]]
df = pd.DataFrame(d, columns =['segment', 'price', 'group'])

# split into two part
a=df.query("group =='A'")
b=df.query("group =='B'")

# a join b
ab=a.join(b.set_index('segment'), on = 'segment', lsuffix='_a', rsuffix='_b')

# sample in group by
ab.groupby(['segment', 'price_a']).sample(n=1)

结果:

segment price_a group_a price_b group_b
0   High    100 A   104 B
1   High    105 A   103 B
4   Low     10  A   9   B

相关问题