Numpy平均值根据行顺序给出略有不同的结果

qjp7pelc  于 2023-04-21  发布在  其他
关注(0)|答案(1)|浏览(121)

在一个测试用例中,我们使用np.testing.assert_allclose来确定两个数据源是否在平均值上彼此一致。但是,尽管具有不同顺序的相同数据,计算的平均值略有不同。下面是一个最短的工作示例:

import numpy as np

x = np.array(
    [[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
    dtype=np.float32,
)
y = np.array(
    [[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
    dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))

np.testing.assert_allclose(z.mean(0), y.mean(0))
np.testing.assert_allclose(x.mean(0), y.mean(0))

使用Python 3.10.6和NumPy 1.24.2,给出以下输出:

X mean [0.58459276 0.8050803 ]
Y mean [0.5845928 0.8050803]
Z [[0.5224021  0.8526993 ]
 [0.70609194 0.7081201 ]
 [0.6045113  0.7965965 ]
 [0.5053657  0.86290526]]
Z mean [0.5845928 0.8050803]
Traceback (most recent call last):
  File "/home/nuric/semafind-db/scribble.py", line 19, in <module>
    np.testing.assert_allclose(x.mean(0), y.mean(0))
  File "/home/nuric/semafind-db/.venv/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/nuric/semafind-db/.venv/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 1 / 2 (50%)
Max absolute difference: 5.9604645e-08
Max relative difference: 1.0195925e-07
 x: array([0.584593, 0.80508 ], dtype=float32)
 y: array([0.584593, 0.80508 ], dtype=float32)

一个解决方案是减少对Assert的容忍度,但有什么想法为什么会发生这种情况吗?

dfddblmv

dfddblmv1#

你应该使用np.float64来获得更高的精度,根据我的经验,np.float32适用于小数点后3位的数字。这段代码将工作:

import numpy as np

x = np.array(
    [[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
    dtype=np.float64,
)
y = np.array(
    [[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
    dtype=np.float64,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))

np.testing.assert_allclose(z.mean(0), y.mean(0))
np.testing.assert_allclose(x.mean(0), y.mean(0))

你可以做的另一件事是增加容忍度:

import numpy as np

x = np.array(
    [[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
    dtype=np.float32,
)
y = np.array(
    [[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
    dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))

np.testing.assert_allclose(z.mean(0), y.mean(0), rtol=1e-6)
np.testing.assert_allclose(x.mean(0), y.mean(0), rtol=1e-6)

最后,这个错误的发生是因为它们的总和在3种情况下都是以不同的顺序完成的,因此每个数字都会有轻微的差异,因为它们将四舍五入到np.float32。你可以通过打印更多的小数位来看到:

import numpy as np

np.set_printoptions(formatter={'float': lambda x: "{0:0.10f}".format(x)})

x = np.array(
    [[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
    dtype=np.float32,
)
y = np.array(
    [[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
    dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))

np.testing.assert_allclose(z.mean(0), y.mean(0), rtol=1e-6)
np.testing.assert_allclose(x.mean(0), y.mean(0), rtol=1e-6)

它将打印:

X mean [0.5845927596 0.8050802946]
Y mean [0.5845928192 0.8050802946]
Z [[0.5224021077 0.8526992798]
 [0.7060919404 0.7081201077]
 [0.6045113206 0.7965965271]
 [0.5053657293 0.8629052639]]
Z mean [0.5845928192 0.8050802946]

相关问题