tensorflow 什么是tf.bfloat16“截断的16位浮点”？

igetnqfo 于 2022-11-16 发布在其他

关注(0)|答案(2)|浏览(500)

https://www.tensorflow.org/versions/r0.12/api_docs/python/framework/tensor_types中列出的tf.float16和tf.bfloat16之间有什么区别？
还有，他们所说的“量化整数”是什么意思？

tensorflow

来源：https://stackoverflow.com/questions/44873802/what-is-tf-bfloat16-truncated-16-bit-floating-point

2条答案

按热度按时间

8cdiaqws1#

bfloat16是一种tensorflow 格式，不同于IEEE自己的float16，因此有了新的名称。
基本上，bfloat16是float32的前16位截断值。因此，它有相同的8位用于指数，只有7位用于尾数。因此，它很容易与float32进行转换，并且因为它与float32具有基本相同的值域。它最小化了当从float32切换时具有NaN或爆发/消失梯度的风险。
从sources：

// Compact 16-bit encoding of floating point numbers. This representation uses
// 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa.  It
// is assumed that floats are in IEEE 754 format so the representation is just
// bits 16-31 of a single precision float.
//
// NOTE: The IEEE floating point standard defines a float16 format that
// is different than this format (it has fewer bits of exponent and more
// bits of mantissa).  We don't use that format here because conversion
// to/from 32-bit floats is more complex for that format, and the
// conversion for this format is very simple.