在一个测试用例中,我们使用np.testing.assert_allclose
来确定两个数据源是否在平均值上彼此一致。但是,尽管具有不同顺序的相同数据,计算的平均值略有不同。下面是一个最短的工作示例:
import numpy as np
x = np.array(
[[0.5224021, 0.8526993], [0.6045113, 0.7965965], [0.5053657, 0.86290526], [0.70609194, 0.7081201]],
dtype=np.float32,
)
y = np.array(
[[0.5224021, 0.8526993], [0.70609194, 0.7081201], [0.6045113, 0.7965965], [0.5053657, 0.86290526]],
dtype=np.float32,
)
print("X mean", x.mean(0))
print("Y mean", y.mean(0))
z = x[[0, 3, 1, 2]]
print("Z", z)
print("Z mean", z.mean(0))
np.testing.assert_allclose(z.mean(0), y.mean(0))
np.testing.assert_allclose(x.mean(0), y.mean(0))
使用Python 3.10.6和NumPy 1.24.2,给出以下输出:
X mean [0.58459276 0.8050803 ]
Y mean [0.5845928 0.8050803]
Z [[0.5224021 0.8526993 ]
[0.70609194 0.7081201 ]
[0.6045113 0.7965965 ]
[0.5053657 0.86290526]]
Z mean [0.5845928 0.8050803]
Traceback (most recent call last):
File "/home/nuric/semafind-db/scribble.py", line 19, in <module>
np.testing.assert_allclose(x.mean(0), y.mean(0))
File "/home/nuric/semafind-db/.venv/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/nuric/semafind-db/.venv/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 1 / 2 (50%)
Max absolute difference: 5.9604645e-08
Max relative difference: 1.0195925e-07
x: array([0.584593, 0.80508 ], dtype=float32)
y: array([0.584593, 0.80508 ], dtype=float32)
一个解决方案是减少对Assert的容忍度,但有什么想法为什么会发生这种情况吗?
1条答案
按热度按时间dfddblmv1#
你应该使用
np.float64
来获得更高的精度,根据我的经验,np.float32
适用于小数点后3位的数字。这段代码将工作:你可以做的另一件事是增加容忍度:
最后,这个错误的发生是因为它们的总和在3种情况下都是以不同的顺序完成的,因此每个数字都会有轻微的差异,因为它们将四舍五入到
np.float32
。你可以通过打印更多的小数位来看到:它将打印: