请确保这是一个bug。根据我们的规定，
我们只在GitHub上解决代码/文档bug、性能问题、功能请求和构建/安装问题。标签：bug_template

系统信息

是否编写了自定义代码(与使用TensorFlow提供的示例脚本相反):是
OS平台和发行版(例如，Linux Ubuntu 16.04):Debian 10
移动设备(例如iPhone 8,Pixel 2,三星Galaxy),如果问题发生在移动设备上：
从哪里安装的TensorFlow(源代码或二进制文件):二进制文件
TensorFlow版本(使用以下命令):2.5.0和2.7.0.dev20210725
Python版本：3.9.5
Bazel版本(如果从源代码编译):
GCC/编译器版本(如果从源代码编译):
CUDA/cuDNN版本：
GPU型号和内存：

您可以使用我们的环境捕获工具收集一些此信息。您还可以使用以下命令获取TensorFlow版本：

TF 1.0:python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
TF 2.0:python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"
v1.12.1-60944-g65153d2677e 2.7.0-dev20210725

描述当前行为

当tf.data.Dataset缓存到文件中时，即使数据集已完全读取，仍会出现警告。

描述预期行为

当数据集完全读取时，不应出现警告。

Contributing

您是否想提交PR?(是/否):否
如果要贡献，简要描述您的候选解决方案：
重现问题的独立代码

提供一个最小必要的可复现测试用例，以生成问题。如果可能，请分享一个链接到Colab/Jupyter/任何笔记本。

import os
import tensorflow as tf

os.system('rm -f /tmp/foo*')

def gen():
    for i in range(5):
        yield i

dataset = tf.data.Dataset.from_generator(gen, output_signature=tf.TensorSpec(shape=(), dtype=tf.int32))
dataset = dataset.cache('/tmp/foo')
# dataset = dataset.take(3).cache('/tmp/foo').repeat(2)

for x in dataset:
    print(x)

即使像警告建议的那样使用take.cache.repeat,也无济于事。

其他信息/日志

包括有助于诊断问题的任何日志或源代码。如果包括回溯，请包括完整的回溯。大型日志和文件应附加。
上述脚本的输出：

tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
2021-07-26 13:35:10.940312: W tensorflow/core/kernels/data/cache_dataset_ops.cc:233] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.

在strace下运行此脚本显示，在检查其存在之前，一个分片文件被删除：

[pid   910] stat("/tmp/foo_0.index", {st_mode=S_IFREG|0644, st_size=201, ...}) = 0
[pid   910] openat(AT_FDCWD, "/tmp/foo_0.index", O_RDONLY) = 4
[pid   910] pread64(4, "y\10\206\1\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 48, 153) = 48
[pid   910] pread64(4, "\0\1\2!\0t\0\0\0\0\1\0\0\0\0t\214P\20", 19, 134) = 19
[pid   910] pread64(4, "\0\0\6\10\1\32\2\10\1\0\t\v      0_0\10\3\22\0(\0045\246{\21:"..., 121, 0) = 121
[pid   910] close(4)                    = 0
[pid   910] rename("/tmp/foo_0.data-00000-of-00001", "/tmp/foo.data-00000-of-00001") = 0
[pid   910] openat(AT_FDCWD, "/tmp/foo.index", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
[pid   910] fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid   910] write(4, "\0\0\6\10\1\32\2\10\1\0\t\v      0_0\10\3\22\0(\0045\246{\21:"..., 201) = 201
[pid   910] close(4)                    = 0
[pid   910] unlink("/tmp/foo_0.index")  = 0
[pid   910] unlink("/tmp/foo_0.lockfile") = 0
[pid   910] access("/tmp/foo.index", F_OK) = 0
[pid   910] access("/tmp/foo_0.index", F_OK) = -1 ENOENT (No such file or directory)
[pid   910] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0
[pid   910] write(2, "2021-07-26 13:17:56.497443: W te"..., 4242021-07-26 13:17:56.497443: W tensorflow/core/kernels/data/cache_dataset_ops.cc:233] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
) = 424

@tilakrayal,您能详细说明这如何有所帮助吗？我不明白。在这种情况下，将CUDA_LAUNCH_BLOCKING设置为1对结果没有任何影响。一方面，该SO答案提到了RAM缓存(但警告来自FS缓存)。另一方面，它提到在Google Colab中没有警告。但是它们仍然存在 - https://stackoverflow.com/a/53353891
因此，代码片段需要进行修改：

!pip install wurlitzer

import os
import tensorflow as tf
from wurlitzer import pipes

os.system('rm -f /tmp/foo*')

def gen():
    for i in range(5):
        yield i

dataset = tf.data.Dataset.from_generator(gen, output_signature=tf.TensorSpec(shape=(), dtype=tf.int32))
dataset = dataset.cache('/tmp/foo')
# dataset = dataset.take(3).cache('/tmp/foo').repeat(2)

with pipes() as (out, err):
    for x in dataset:
        print(x)

print(out.read())
print(err.read())