给定一个表示2D分布的2D Numpy数组，如何借助Numpy或Scipy函数从该分布中采样数据？

unftdfkk 于 2022-11-10 发布在其他

关注(0)|答案(4)|浏览(156)

给定一个二维numpy数组dist，其形状为(200,200)，其中数组的每一个元素表示（x1，x2）对所有x1，x2 ∈ {0，1，.. .，199}的联合概率，如何借助Numpy或Scipy API从该概率分布中采样二元数据x=（x1，x2）？

scipy

来源：https://stackoverflow.com/questions/56017163/given-a-2d-numpy-array-representing-a-2d-distribution-how-to-sample-data-from-t

4条答案

按热度按时间

sqxo8psd1#

此解决方案适用于任意维数的概率分布，假设它们是有效的概率分布（其内容之和必须为1，等等）。它将分布展平，从中进行采样，并调整随机索引以匹配原始数组形状。


# Create a flat copy of the array

flat = array.flatten()

# Then, sample an index from the 1D array with the

# probability distribution from the original array

sample_index = np.random.choice(a=flat.size, p=flat)

# Take this index and adjust it so it matches the original array

adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)

此外，要获取多个样本，请在np.random.choice调用中添加一个size关键字参数，并在打印adjusted_index之前对其进行修改：

adjusted_index = np.array(zip(*adjusted_index))

这是必要的，因为带有size参数的np.random.choice会输出每个坐标维度的索引列表，所以这会将它们压缩成一个坐标元组列表。这也比简单地重复第一段代码要 * 高效得多 *。
相关文件：

赞(0）回复(0）举报 2022-11-10

ruoxqz4g2#

这里有一个方法，但我相信使用scipy会有一个更优雅的解决方案。numpy.random不处理2d pmfs，所以你必须做一些重塑体操来实现这一点。

import numpy as np

# construct a toy joint pmf

dist=np.random.random(size=(200,200)) # here's your joint pmf 
dist/=dist.sum() # it has to be normalized 

# generate the set of all x,y pairs represented by the pmf

pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs 

# make n random selections from the flattened pmf without replacement

# whether you want replacement depends on your application

n=50 
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)

# inds is the set of n randomly chosen indicies into the flattened dist array...

# therefore the random x,y selections

# come from selecting the associated elements

# from the flattened pairs array

selections = pairs.reshape(-1,2)[inds]

赞(0）回复(0）举报 2022-11-10

v64noz0r3#

我无法发表意见，但对杨克的回答倒是提高了一点：

pairs=np.indices(dimensions=(200,200)).T
selections = pairs.reshape(-1,2)[inds]

不需要时可替换为：

np.array([inds//m, inds%m]).T

不再需要矩阵“对”。

赞(0）回复(0）举报 2022-11-10

irlmq6kh4#

我也不能发表评论，但是@applemonkey496关于获取多个样本的建议并没有像写的那样起作用。
而不是

adjusted_index = np.array(zip(*adjusted_index))

adjusted_index应在尝试将其放入numpy数组之前转换为python列表（numpy数组不接受压缩对象），例如：

adjusted_index = np.array(list(zip(*adjusted_index)))

赞(0）回复(0）举报 2022-11-10

我来回答

给定一个表示2D分布的2D Numpy数组，如何借助Numpy或Scipy函数从该分布中采样数据？

4条答案

相关问题

热门标签

最新问答