我想在Numpy中实现Tensorflow或PyTorch的分散和聚集操作。
torch.scatter
torch.gather
wn9m85ua1#
有两个内置的numpy函数可以满足您的要求:
np.take_along_axis
np.put_along_axis
dnph8jn42#
scatter方法的工作量比我预期的要大得多。我在NumPy中没有找到任何现成的函数。我在这里分享它是为了任何可能需要使用NumPy实现它的人的利益。(p.s. self是方法的目的地或输出。
scatter
self
def scatter_numpy(self, dim, index, src): """ Writes all values from the Tensor src into self at the indices specified in the index Tensor. :param dim: The axis along which to index :param index: The indices of elements to scatter :param src: The source element(s) to scatter :return: self """ if index.dtype != np.dtype('int_'): raise TypeError("The values of index must be integers") if self.ndim != index.ndim: raise ValueError("Index should have the same number of dimensions as output") if dim >= self.ndim or dim < -self.ndim: raise IndexError("dim is out of range") if dim < 0: # Not sure why scatter should accept dim < 0, but that is the behavior in PyTorch's scatter dim = self.ndim + dim idx_xsection_shape = index.shape[:dim] + index.shape[dim + 1:] self_xsection_shape = self.shape[:dim] + self.shape[dim + 1:] if idx_xsection_shape != self_xsection_shape: raise ValueError("Except for dimension " + str(dim) + ", all dimensions of index and output should be the same size") if (index >= self.shape[dim]).any() or (index < 0).any(): raise IndexError("The values of index must be between 0 and (self.shape[dim] -1)") def make_slice(arr, dim, i): slc = [slice(None)] * arr.ndim slc[dim] = i return slc # We use index and dim parameters to create idx # idx is in a form that can be used as a NumPy advanced index for scattering of src param. in self idx = [[*np.indices(idx_xsection_shape).reshape(index.ndim - 1, -1), index[make_slice(index, dim, i)].reshape(1, -1)[0]] for i in range(index.shape[dim])] idx = list(np.concatenate(idx, axis=1)) idx.insert(dim, idx.pop()) if not np.isscalar(src): if index.shape[dim] > src.shape[dim]: raise IndexError("Dimension " + str(dim) + "of index can not be bigger than that of src ") src_xsection_shape = src.shape[:dim] + src.shape[dim + 1:] if idx_xsection_shape != src_xsection_shape: raise ValueError("Except for dimension " + str(dim) + ", all dimensions of index and src should be the same size") # src_idx is a NumPy advanced index for indexing of elements in the src src_idx = list(idx) src_idx.pop(dim) src_idx.insert(dim, np.repeat(np.arange(index.shape[dim]), np.prod(idx_xsection_shape))) self[idx] = src[src_idx] else: self[idx] = src return self
def scatter_numpy(self, dim, index, src):
"""
Writes all values from the Tensor src into self at the indices specified in the index Tensor.
:param dim: The axis along which to index
:param index: The indices of elements to scatter
:param src: The source element(s) to scatter
:return: self
if index.dtype != np.dtype('int_'):
raise TypeError("The values of index must be integers")
if self.ndim != index.ndim:
raise ValueError("Index should have the same number of dimensions as output")
if dim >= self.ndim or dim < -self.ndim:
raise IndexError("dim is out of range")
if dim < 0:
# Not sure why scatter should accept dim < 0, but that is the behavior in PyTorch's scatter
dim = self.ndim + dim
idx_xsection_shape = index.shape[:dim] + index.shape[dim + 1:]
self_xsection_shape = self.shape[:dim] + self.shape[dim + 1:]
if idx_xsection_shape != self_xsection_shape:
raise ValueError("Except for dimension " + str(dim) +
", all dimensions of index and output should be the same size")
if (index >= self.shape[dim]).any() or (index < 0).any():
raise IndexError("The values of index must be between 0 and (self.shape[dim] -1)")
def make_slice(arr, dim, i):
slc = [slice(None)] * arr.ndim
slc[dim] = i
return slc
# We use index and dim parameters to create idx
# idx is in a form that can be used as a NumPy advanced index for scattering of src param. in self
idx = [[*np.indices(idx_xsection_shape).reshape(index.ndim - 1, -1),
index[make_slice(index, dim, i)].reshape(1, -1)[0]] for i in range(index.shape[dim])]
idx = list(np.concatenate(idx, axis=1))
idx.insert(dim, idx.pop())
if not np.isscalar(src):
if index.shape[dim] > src.shape[dim]:
raise IndexError("Dimension " + str(dim) + "of index can not be bigger than that of src ")
src_xsection_shape = src.shape[:dim] + src.shape[dim + 1:]
if idx_xsection_shape != src_xsection_shape:
raise ValueError("Except for dimension " +
str(dim) + ", all dimensions of index and src should be the same size")
# src_idx is a NumPy advanced index for indexing of elements in the src
src_idx = list(idx)
src_idx.pop(dim)
src_idx.insert(dim, np.repeat(np.arange(index.shape[dim]), np.prod(idx_xsection_shape)))
self[idx] = src[src_idx]
else:
self[idx] = src
return self
gather可能有一个更简单的解决方案,但这是我的解决方案:(here self是从中收集值的ndarray。
gather
def gather_numpy(self, dim, index): """ Gathers values along an axis specified by dim. For a 3-D tensor the output is specified by: out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0 out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1 out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2 :param dim: The axis along which to index :param index: A tensor of indices of elements to gather :return: tensor of gathered values """ idx_xsection_shape = index.shape[:dim] + index.shape[dim + 1:] self_xsection_shape = self.shape[:dim] + self.shape[dim + 1:] if idx_xsection_shape != self_xsection_shape: raise ValueError("Except for dimension " + str(dim) + ", all dimensions of index and self should be the same size") if index.dtype != np.dtype('int_'): raise TypeError("The values of index must be integers") data_swaped = np.swapaxes(self, 0, dim) index_swaped = np.swapaxes(index, 0, dim) gathered = np.choose(index_swaped, data_swaped) return np.swapaxes(gathered, 0, dim)
def gather_numpy(self, dim, index):
Gathers values along an axis specified by dim.
For a 3-D tensor the output is specified by:
out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2
:param index: A tensor of indices of elements to gather
:return: tensor of gathered values
", all dimensions of index and self should be the same size")
data_swaped = np.swapaxes(self, 0, dim)
index_swaped = np.swapaxes(index, 0, dim)
gathered = np.choose(index_swaped, data_swaped)
return np.swapaxes(gathered, 0, dim)
xesrikrc3#
scatter_nd操作可以使用*np*'s ufuncs .at函数来实现。根据TF scatter_nd's文件:调用tf.scatter_nd(indices, values, shape)与调用tensor_scatter_add(tf.zeros(shape, values.dtype), indices, values)完全相同。因此,您可以使用应用于np.zeros阵列的np.add.at来重现tf.scatter_nd,请参阅下面的MVCE:
scatter_nd
*np*'s ufuncs .at
scatter_nd's
tf.scatter_nd(indices, values, shape)
tensor_scatter_add(tf.zeros(shape, values.dtype), indices, values)
np.zeros
np.add.at
tf.scatter_nd
import tensorflow as tftf.enable_eager_execution() # Remove this line if working in TF2import numpy as npdef scatter_nd_numpy(indices, updates, shape): target = np.zeros(shape, dtype=updates.dtype) indices = tuple(indices.reshape(-1, indices.shape[-1]).T) updates = updates.ravel() np.add.at(target, indices, updates) return targetindices = np.array([[[0, 0], [0, 1]], [[1, 0], [1, 1]]])updates = np.array([[1, 2], [3, 4]])shape = (2, 3)scattered_tf = tf.scatter_nd(indices, updates, shape).numpy()scattered_np = scatter_nd_numpy(indices, updates, shape)assert np.allclose(scattered_tf, scattered_np)
import tensorflow as tf
tf.enable_eager_execution() # Remove this line if working in TF2
import numpy as np
def scatter_nd_numpy(indices, updates, shape):
target = np.zeros(shape, dtype=updates.dtype)
indices = tuple(indices.reshape(-1, indices.shape[-1]).T)
updates = updates.ravel()
np.add.at(target, indices, updates)
return target
indices = np.array([[[0, 0], [0, 1]], [[1, 0], [1, 1]]])
updates = np.array([[1, 2], [3, 4]])
shape = (2, 3)
scattered_tf = tf.scatter_nd(indices, updates, shape).numpy()
scattered_np = scatter_nd_numpy(indices, updates, shape)
assert np.allclose(scattered_tf, scattered_np)
注意:正如@denis所指出的,当某些索引重复时,上述解决方案会有所不同,这可以通过使用计数器并仅获取每个重复索引的最后一个来解决。
gopyfrb34#
对于分散,而不是使用切片赋值,如@DomJack所建议的,通常最好使用np.add.at;因为与切片赋值不同,这在存在重复索引的情况下具有定义良好的行为。
hivapdat5#
前ref和indices是numpy数组:散点更新:
ref
indices
ref[indices] = updates # tf.scatter_update(ref, indices, updates)ref[:, indices] = updates # tf.scatter_update(ref, indices, updates, axis=1)ref[..., indices, :] = updates # tf.scatter_update(ref, indices, updates, axis=-2)ref[..., indices] = updates # tf.scatter_update(ref, indices, updates, axis=-1)
ref[indices] = updates # tf.scatter_update(ref, indices, updates)
ref[:, indices] = updates # tf.scatter_update(ref, indices, updates, axis=1)
ref[..., indices, :] = updates # tf.scatter_update(ref, indices, updates, axis=-2)
ref[..., indices] = updates # tf.scatter_update(ref, indices, updates, axis=-1)
集合:
ref[indices] # tf.gather(ref, indices)ref[:, indices] # tf.gather(ref, indices, axis=1)ref[..., indices, :] # tf.gather(ref, indices, axis=-2)ref[..., indices] # tf.gather(ref, indices, axis=-1)
ref[indices] # tf.gather(ref, indices)
ref[:, indices] # tf.gather(ref, indices, axis=1)
ref[..., indices, :] # tf.gather(ref, indices, axis=-2)
ref[..., indices] # tf.gather(ref, indices, axis=-1)
numpy docs on indexing更多
6pp0gazn6#
我做的很像。
def gather(a, dim, index): expanded_index = [index if dim==i else np.arange(a.shape[i]).reshape([-1 if i==j else 1 for j in range(a.ndim)]) for i in range(a.ndim)] return a[expanded_index]def scatter(a, dim, index, b): # a inplace expanded_index = [index if dim==i else np.arange(a.shape[i]).reshape([-1 if i==j else 1 for j in range(a.ndim)]) for i in range(a.ndim)] a[expanded_index] = b
def gather(a, dim, index):
expanded_index = [index if dim==i else np.arange(a.shape[i]).reshape([-1 if i==j else 1 for j in range(a.ndim)]) for i in range(a.ndim)]
return a[expanded_index]
def scatter(a, dim, index, b): # a inplace
a[expanded_index] = b
xn1cxnb47#
对于聚集操作:np.take()https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.take.html
lnxxn5zx8#
如果您只是想要相同的功能,而不是从头开始实现它,numpy.insert()是pytorch中scatter_(dim,index,src)操作的一个足够接近的竞争者,但它只处理一维。
8条答案
按热度按时间wn9m85ua1#
有两个内置的numpy函数可以满足您的要求:
np.take_along_axis
实现torch.gather
np.put_along_axis
实现torch.scatter
dnph8jn42#
scatter
方法的工作量比我预期的要大得多。我在NumPy中没有找到任何现成的函数。我在这里分享它是为了任何可能需要使用NumPy实现它的人的利益。(p.s.self
是方法的目的地或输出。gather
可能有一个更简单的解决方案,但这是我的解决方案:(here
self
是从中收集值的ndarray。xesrikrc3#
scatter_nd
操作可以使用*np*'s ufuncs .at
函数来实现。根据TF
scatter_nd's
文件:调用
tf.scatter_nd(indices, values, shape)
与调用tensor_scatter_add(tf.zeros(shape, values.dtype), indices, values)
完全相同。因此,您可以使用应用于
np.zeros
阵列的np.add.at
来重现tf.scatter_nd
,请参阅下面的MVCE:注意:正如@denis所指出的,当某些索引重复时,上述解决方案会有所不同,这可以通过使用计数器并仅获取每个重复索引的最后一个来解决。
gopyfrb34#
对于分散,而不是使用切片赋值,如@DomJack所建议的,通常最好使用np.add.at;因为与切片赋值不同,这在存在重复索引的情况下具有定义良好的行为。
hivapdat5#
前
ref
和indices
是numpy数组:散点更新:
集合:
numpy docs on indexing更多
6pp0gazn6#
我做的很像。
xn1cxnb47#
对于聚集操作:np.take()
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.take.html
lnxxn5zx8#
如果您只是想要相同的功能,而不是从头开始实现它,
numpy.insert()是pytorch中scatter_(dim,index,src)操作的一个足够接近的竞争者,但它只处理一维。