scipy 将csr_matrix中的几列归零

70gysomp 于 2022-11-10 发布在其他

关注(0)|答案(2)|浏览(192)

假设我有一个稀疏矩阵：

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

我想将第0列和第2列归零。下面是我想得到的结果：

array([[0, 0, 0],
       [0, 0, 0],
       [0, 5, 0]])

下面是我的尝试：

sp_mat = csr_matrix((data, indices, indptr), shape=(3, 3))
zero_cols = np.array([0, 2])
sp_mat[:, zero_cols] = 0

然而，我得到一个警告：

SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.

由于我拥有的sp_mat很大，因此转换为lil_matrix的速度非常慢。
什么是有效的方法？

scipy

来源：https://stackoverflow.com/questions/54413874/zero-several-columns-in-csr-matrix

2条答案

按热度按时间

dkqlctbz1#

In [87]: >>> indptr = np.array([0, 2, 3, 6])
    ...: >>> indices = np.array([0, 2, 2, 0, 1, 2])
    ...: >>> data = np.array([1, 2, 3, 4, 5, 6])
    ...: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [88]: M
Out[88]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>

看看csr赋值语句的结果：

In [89]: M[:, [0, 2]] = 0
/usr/local/lib/python3.6/dist-packages/scipy/sparse/compressed.py:746: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
In [90]: M
Out[90]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 7 stored elements in Compressed Sparse Row format>
In [91]: M.data
Out[91]: array([0, 0, 0, 0, 0, 5, 0])
In [92]: M.indices
Out[92]: array([0, 2, 0, 2, 0, 1, 2], dtype=int32)

它不仅给予了警告，而且实际上增加了“稀疏”项的数量，尽管大多数项现在的值为0。只有当我们清理时，这些项才会被删除：

In [93]: M.eliminate_zeros()
In [94]: M
Out[94]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>

在索引赋值中，csr并不区分设置0和其他值，它对所有值都一视同仁。
我应该注意到，给出效率警告主要是为了防止用户重复使用它（如在一个迭代中）。
对于索引赋值，lil更有效（或者至少它不会警告效率），但是转换成这种格式或从这种格式转换是非常耗时的。
另一种选择是直接查找并设置新的0，后跟eliminate_zeros）。
另一种方法是使用矩阵乘法，我认为对角稀疏矩阵的右列为0就可以了。

In [103]: M
Out[103]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [104]: D = sparse.diags([0,1,0], dtype=M.dtype)
In [105]: D
Out[105]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 3 stored elements (1 diagonals) in DIAgonal format>
In [106]: D.A
Out[106]: 
array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])
In [107]: M1 = M*D
In [108]: M1
Out[108]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>
In [110]: M1.A
Out[110]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 5, 0]], dtype=int64)

如果你在矩阵中进行乘法运算，你不会得到效率警告。它只是改变了现有的非零项的值，所以不会改变矩阵的稀疏性（至少在你消除零之前不会）：

In [111]: M = sparse.csr_matrix((data, indices, indptr), shape=(3, 3))
In [112]: M[:,[0,2]] *= 0
In [113]: M
Out[113]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [114]: M.eliminate_zeros()
In [115]: M
Out[115]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 1 stored elements in Compressed Sparse Row format>

赞(0）回复(0）举报 2022-11-10

6ioyuze22#

矩阵乘法是要走的路。
对于我的大型CSR矩阵（大小为2M*2M），直接使用sp_mat[:, zero_cols] = 0赋值会导致内存不足错误。假设在布尔数组zero_mask中，零列的索引标记为True，则乘以对角矩阵可以高效地完成这项工作（在3秒内）。

import scipy.sparse as sp
sp_mat=sp_mat@sp.diags((~node_mask).astype(int))

这里，(~node_mask).astype(int)给出了一个0和1的一维数组，它指定了哪些列应该保留（1），哪些应该归零（0）。

赞(0）回复(0）举报 2022-11-10

我来回答

scipy 将csr_matrix中的几列归零

2条答案

相关问题

热门标签

最新问答