pytorch Simplified/ResNet 18:最后一个部分批次机制不起作用?(Tensor形状不兼容)

k4ymrczo  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(102)

我正在一个自定义数据集上实现Simplified/ResNet 18架构。
我知道
一个Epoch中的迭代次数=批量大小/总训练数据集大小
如果结果是浮动的,那么最后一批的大小是适应剩菜的(“分数批”)。然而,在我的情况下,最后一种机制似乎不起作用。我的数据集大小为7000。如果我给予100的批大小,那么我有7000/70=100次迭代,没有分数批,训练继续进行。然而,例如,如果我给予一个批处理大小32,那么我会出现以下错误(全栈跟踪)

/home/wlutz/PycharmProjects/hiv-image-analysis/venv/bin/python /home/wlutz/PycharmProjects/hiv-image-analysis/main.py 
2023-10-20 11:12:22.106008: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-20 11:12:22.107921: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-20 11:12:22.133919: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-20 11:12:22.133941: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-20 11:12:22.133955: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-20 11:12:22.138715: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-20 11:12:22.737271: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/__init__.py:11: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
  if not hasattr(numpy, tp_name):
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/__init__.py:11: FutureWarning: In the future `np.bool` will be defined as the corresponding NumPy scalar.
  if not hasattr(numpy, tp_name):
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:34: UnderReviewWarning: The feature generate_power_seq is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
  "lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:92: UnderReviewWarning: The feature FeatureMapContrastiveTask is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
  contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/losses/self_supervised_learning.py:228: UnderReviewWarning: The feature AmdimNCELoss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
  self.nce_loss = AmdimNCELoss(tclip)
available_gpus: 0
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
Dim MLP input: 512
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:478: LightningDeprecationWarning: Setting `Trainer(gpus=0)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=0)` instead.
  rank_zero_deprecation(
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:613: UserWarning: Checkpoint directory /home/wlutz/PycharmProjects/hiv-image-analysis/saved_models exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
/home/wlutz/PycharmProjects/hiv-image-analysis/main.py:330: UnderReviewWarning: The feature LinearWarmupCosineAnnealingLR is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
  scheduler_warmup = LinearWarmupCosineAnnealingLR(optimizer, warmup_epochs=10, max_epochs=max_epochs,

  | Name  | Type            | Params
------------------------------------------
0 | model | AddProjection   | 11.5 M
1 | loss  | ContrastiveLoss | 0     
------------------------------------------
11.5 M    Trainable params
0         Non-trainable params
11.5 M    Total params
46.024    Total estimated model params size (MB)
Optimizer Adam, Learning Rate 0.0003, Effective batch size 160
Epoch 0: 100%|█████████▉| 218/219 [04:03<00:01,  1.12s/it, loss=3.74, v_num=58, Contrastive loss_step=3.650]Traceback (most recent call last):
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 388, in <module>
    trainer.fit(model, data_loader)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
    results = self._run_stage()
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
    self._run_train()
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 213, in advance
    batch_output = self.batch_loop.run(kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(optimizers, kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 202, in advance
    result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 249, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 370, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1754, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
    return self.precision_plugin.optimizer_step(
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 119, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 373, in wrapper
    out = func(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/adam.py", line 143, in step
    loss = closure()
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 105, in _wrap_closure
    closure_result = closure()
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 149, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 135, in closure
    step_output = self._step_fn()
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 419, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1494, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 316, in training_step
    loss = self.loss(z1, z2)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 243, in forward
    denominator = device_as(self.mask, similarity_matrix) * torch.exp(similarity_matrix / self.temperature)
RuntimeError: The size of tensor a (64) must match the size of tensor b (48) at non-singleton dimension 1

Process finished with exit code 1

字符串
下面是一些代码(错误发生在最后一行):

train_config = Hparams()

reproducibility(train_config)

model = SimCLR_pl(train_config, model=resnet18(pretrained=False), feat_dim=512)

transform = Augment(train_config.img_size)
data_loader = get_stl_dataloader(train_config.batch_size, transform)

accumulator = GradientAccumulationScheduler(scheduling={0: train_config.gradient_accumulation_steps})
checkpoint_callback = ModelCheckpoint(filename=filename, dirpath=save_model_path, every_n_epochs=2,
                                      save_last=True, save_top_k=2, monitor='Contrastive loss_epoch', mode='min')

trainer = Trainer(callbacks=[accumulator, checkpoint_callback],
                  gpus=available_gpus,
                  max_epochs=train_config.epochs)

trainer.fit(model, data_loader)


这是我的班级:

class Hparams:
    def __init__(self):
        self.epochs = 10  # number of training epochs
        self.seed = 33333  # randomness seed
        self.cuda = True  # use nvidia gpu
        self.img_size = 224  # image shape
        self.save = "./saved_models/"  # save checkpoint
        self.load = False  # load pretrained checkpoint
        self.gradient_accumulation_steps = 5  # gradient accumulation steps
        self.batch_size = 70
        self.lr = 3e-4  # for ADAm only
        self.weight_decay = 1e-6
        self.embedding_size = 128  # papers value is 128
        self.temperature = 0.5  # 0.1 or 0.5
        self.checkpoint_path = '/media/wlutz/TOSHIBA EXT/Image Analysis/VIH PROJECT/models'  # replace checkpoint path here

class SimCLR_pl(pl.LightningModule):
    def __init__(self, config, model=None, feat_dim=512):
        super().__init__()
        self.config = config

        self.model = AddProjection(config, model=model, mlp_dim=feat_dim)

        self.loss = ContrastiveLoss(config.batch_size, temperature=self.config.temperature)

    def forward(self, X):
        return self.model(X)

    def training_step(self, batch, batch_idx):
        (x1, x2) = batch
        z1 = self.model(x1)
        z2 = self.model(x2)
        loss = self.loss(z1, z2)
        self.log('Contrastive loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
        return loss

    def configure_optimizers(self):
        max_epochs = int(self.config.epochs)
        param_groups = define_param_groups(self.model, self.config.weight_decay, 'adam')
        lr = self.config.lr
        optimizer = Adam(param_groups, lr=lr, weight_decay=self.config.weight_decay)

        print(f'Optimizer Adam, '
              f'Learning Rate {lr}, '
              f'Effective batch size {self.config.batch_size * self.config.gradient_accumulation_steps}')

        scheduler_warmup = LinearWarmupCosineAnnealingLR(optimizer, warmup_epochs=10, max_epochs=max_epochs,
                                                         warmup_start_lr=0.0)

        return [optimizer], [scheduler_warmup]

class AddProjection(nn.Module):
    def __init__(self, config, model=None, mlp_dim=512):
        super(AddProjection, self).__init__()
        embedding_size = config.embedding_size
        self.backbone = default(model, models.resnet18(pretrained=False, num_classes=config.embedding_size))
        mlp_dim = default(mlp_dim, self.backbone.fc.in_features)
        print('Dim MLP input:', mlp_dim)
        self.backbone.fc = nn.Identity()

        # add mlp projection head
        self.projection = nn.Sequential(
            nn.Linear(in_features=mlp_dim, out_features=mlp_dim),
            nn.BatchNorm1d(mlp_dim),
            nn.ReLU(),
            nn.Linear(in_features=mlp_dim, out_features=embedding_size),
            nn.BatchNorm1d(embedding_size),
        )

    def forward(self, x, return_embedding=False):
        embedding = self.backbone(x)
        if return_embedding:
            return embedding
        return self.projection(embedding)

class ContrastiveLoss(nn.Module):
    """
    Vanilla Contrastive loss, also called InfoNceLoss as in SimCLR paper
    """

    def __init__(self, batch_size, temperature=0.5):
        super().__init__()
        self.batch_size = batch_size
        self.temperature = temperature
        self.mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=bool)).float()

    def calc_similarity_batch(self, a, b):
        representations = torch.cat([a, b], dim=0)
        similarity_matrix = F.cosine_similarity(representations.unsqueeze(1), representations.unsqueeze(0), dim=2)
        return similarity_matrix

    def forward(self, proj_1, proj_2):
        """
        proj_1 and proj_2 are batched embeddings [batch, embedding_dim]
        where corresponding indices are pairs
        z_i, z_j in the SimCLR paper
        """
        batch_size = proj_1.shape[0]
        z_i = F.normalize(proj_1, p=2, dim=1)
        z_j = F.normalize(proj_2, p=2, dim=1)
        similarity_matrix = self.calc_similarity_batch(z_i, z_j)

        sim_ij = torch.diag(similarity_matrix, batch_size)
        sim_ji = torch.diag(similarity_matrix, -batch_size)

        positives = torch.cat([sim_ij, sim_ji], dim=0)

        nominator = torch.exp(positives / self.temperature)
        # print(" sim matrix ", similarity_matrix.shape)
        # print(" device ", device_as(self.mask, similarity_matrix).shape, " torch exp ", torch.exp(similarity_matrix / self.temperature).shape)
        denominator = device_as(self.mask, similarity_matrix) * torch.exp(similarity_matrix / self.temperature)

        all_losses = -torch.log(nominator / torch.sum(denominator, dim=1))
        loss = torch.sum(all_losses) / (2 * self.batch_size)
        return loss

class ImageDataResourceDataset(VisionDataset):
    train_list = ['train_X_v1.bin', ]
    test_list = ['test_X_v1.bin', ]

    def __init__(self, root: str, transform: Optional[Callable] = None, ):
        super().__init__(root=root, transform=transform)
        self.data = self.__loadfile(self.train_list[0])

    def __len__(self) -> int:
        return self.data.shape[0]

    def __getitem__(self, idx):
        img = self.data[idx]
        img = np.transpose(img, (1, 2, 0))
        img = Image.fromarray(img)
        img = self.transform(img)
        return img

    def __loadfile(self, data_file: str) -> np.ndarray:
        path_to_data = os.path.join(os.getcwd(), 'datasets', data_file)
        everything = np.fromfile(path_to_data, dtype=np.uint8)
        images = np.reshape(everything, (-1, 3, 224, 224))
        images = np.transpose(images, (0, 1, 3, 2))
        return images


为了记录,我的数据集有7000张大小为224 x224的RGB图像。
为什么我的最后一批“分数”不支持?非常感谢您的帮助。

xkrw2x1b

xkrw2x1b1#

我在GitHub存储库中找到了原始源代码:https://github.com/The-AI-Summer/simclr/blob/main/AI_Summer_SimCLR_Resnet18_STL10.ipynb
根据您提供的错误消息:

File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 243, in forward
    denominator = device_as(self.mask, similarity_matrix) * torch.exp(similarity_matrix / self.temperature)

字符串
问题似乎源于这行代码:

denominator = device_as(self.mask, similarity_matrix) * torch.exp(similarity_matrix / self.temperature)


class ContrastiveLoss(nn.Module)中。
为了研究这个类的问题,让我们首先检查它的初始化变量:

def __init__(self, batch_size, temperature=0.5):
    super().__init__()
    self.batch_size = batch_size
    self.temperature = temperature
    self.mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=bool)).float()


错误似乎与batch_size有关。我们需要确定代码的哪一部分调用了ContrastiveLoss()
现在,让我们定位class SimCLR_pl(pl.LightningModule)中使用ContrastiveLoss()的位置:

self.loss = ContrastiveLoss(config.batch_size, temperature=self.config.temperature)


问题出在config.batch_size上,ContrastiveLoss()的定义依赖于config.batch_size
您正在使用批大小为32的7000个数据点。因此,最终迭代的批大小为7000% 32 = 24。
因为__init__函数利用:

self.mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=bool)).float()


函数期望大小为config.batch_size * 2 = 64。但是,您现在的大小为24 * 2 = 48。这种差异是错误消息的原因:

RuntimeError: The size of tensor a (64) must match the size of tensor b (48) at non-singleton dimension 1


要解决此问题,您应该确保batch_size的大小与ContrastiveLoss()类的预期正确对齐。

相关问题