bert 当评估时，drop_remainder=True是否会导致代码出现错误？

wvyml7n5 于 4个月前发布在其他

关注(0)|答案(1)|浏览(72)

当我阅读预训练代码时，注解表示在训练时 drop_remainder 应该为 true,评估时应该为 false,但代码让我感到困惑。我不确定这是 bug 还是我遗漏了什么。
代码如下：run_pretraining.py

# For training, we want a lot of parallel reading and shuffling.
    # For eval, we want no shuffling and parallel reading doesn't matter.
    if is_training:
      d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))
      d = d.repeat()
      d = d.shuffle(buffer_size=len(input_files))

      # `cycle_length` is the number of parallel files that get read.
      cycle_length = min(num_cpu_threads, len(input_files))

      # `sloppy` mode means that the interleaving is not exact. This adds
      # even more randomness to the training pipeline.
      d = d.apply(
          tf.contrib.data.parallel_interleave(
              tf.data.TFRecordDataset,
              sloppy=is_training,
              cycle_length=cycle_length))
      d = d.shuffle(buffer_size=100)
    else:
      d = tf.data.TFRecordDataset(input_files)
      # Since we evaluate for a fixed number of steps we don't want to encounter
      # out-of-range exceptions.
      d = d.repeat()

    # We must `drop_remainder` on training because the TPU requires fixed
    # size dimensions. For eval, we assume we are evaluating on the CPU or GPU
    # and we *don't* want to drop the remainder, otherwise we wont cover
    # every sample.
    d = d.apply(
        tf.contrib.data.map_and_batch(
            lambda record: _decode_record(record, name_to_features),
            batch_size=batch_size,
            num_parallel_batches=num_cpu_threads,
            drop_remainder=True))
    return d

显然，函数 d = d.apply(tf.contrib.data.map_and_batch(...)) 无论在训练还是评估时都会被执行。但这可能导致评估时数据丢失，这是一个问题。对吗？提前感谢 @jacobdevlin-google

bert

来源：https://github.com/google-research/bert/issues/730