我正在一个自定义数据集上实现Simplified/ResNet 18架构。
我知道
一个Epoch中的迭代次数=批量大小/总训练数据集大小
如果结果是浮动的,那么最后一批的大小是适应剩菜的(“分数批”)。然而,在我的情况下,最后一种机制似乎不起作用。我的数据集大小为7000。如果我给予100的批大小,那么我有7000/70=100次迭代,没有分数批,训练继续进行。然而,例如,如果我给予一个批处理大小32,那么我会出现以下错误(全栈跟踪)
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/bin/python /home/wlutz/PycharmProjects/hiv-image-analysis/main.py
2023-10-20 11:12:22.106008: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-20 11:12:22.107921: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-10-20 11:12:22.133919: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-20 11:12:22.133941: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-20 11:12:22.133955: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-20 11:12:22.138715: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-20 11:12:22.737271: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/__init__.py:11: FutureWarning: In the future `np.object` will be defined as the corresponding NumPy scalar.
if not hasattr(numpy, tp_name):
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/__init__.py:11: FutureWarning: In the future `np.bool` will be defined as the corresponding NumPy scalar.
if not hasattr(numpy, tp_name):
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:34: UnderReviewWarning: The feature generate_power_seq is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
"lr_options": generate_power_seq(LEARNING_RATE_CIFAR, 11),
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/models/self_supervised/amdim/amdim_module.py:92: UnderReviewWarning: The feature FeatureMapContrastiveTask is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
contrastive_task: Union[FeatureMapContrastiveTask] = FeatureMapContrastiveTask("01, 02, 11"),
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pl_bolts/losses/self_supervised_learning.py:228: UnderReviewWarning: The feature AmdimNCELoss is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
self.nce_loss = AmdimNCELoss(tclip)
available_gpus: 0
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
warnings.warn(msg)
Dim MLP input: 512
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:478: LightningDeprecationWarning: Setting `Trainer(gpus=0)` is deprecated in v1.7 and will be removed in v2.0. Please use `Trainer(accelerator='gpu', devices=0)` instead.
rank_zero_deprecation(
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:613: UserWarning: Checkpoint directory /home/wlutz/PycharmProjects/hiv-image-analysis/saved_models exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
/home/wlutz/PycharmProjects/hiv-image-analysis/main.py:330: UnderReviewWarning: The feature LinearWarmupCosineAnnealingLR is currently marked under review. The compatibility with other Lightning projects is not guaranteed and API may change at any time. The API and functionality may change without warning in future releases. More details: https://lightning-bolts.readthedocs.io/en/latest/stability.html
scheduler_warmup = LinearWarmupCosineAnnealingLR(optimizer, warmup_epochs=10, max_epochs=max_epochs,
| Name | Type | Params
------------------------------------------
0 | model | AddProjection | 11.5 M
1 | loss | ContrastiveLoss | 0
------------------------------------------
11.5 M Trainable params
0 Non-trainable params
11.5 M Total params
46.024 Total estimated model params size (MB)
Optimizer Adam, Learning Rate 0.0003, Effective batch size 160
Epoch 0: 100%|█████████▉| 218/219 [04:03<00:01, 1.12s/it, loss=3.74, v_num=58, Contrastive loss_step=3.650]Traceback (most recent call last):
File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 388, in <module>
trainer.fit(model, data_loader)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
results = self._run_stage()
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
self._run_train()
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
self.fit_loop.run()
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 213, in advance
batch_output = self.batch_loop.run(kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(optimizers, kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 202, in advance
result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position])
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 249, in _run_optimization
self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 370, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/core/module.py", line 1754, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 169, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 234, in optimizer_step
return self.precision_plugin.optimizer_step(
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 119, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/optim/adam.py", line 143, in step
loss = closure()
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 105, in _wrap_closure
closure_result = closure()
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 149, in __call__
self._result = self.closure(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 135, in closure
step_output = self._step_fn()
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 419, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *kwargs.values())
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1494, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 378, in training_step
return self.model.training_step(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 316, in training_step
loss = self.loss(z1, z2)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/wlutz/PycharmProjects/hiv-image-analysis/main.py", line 243, in forward
denominator = device_as(self.mask, similarity_matrix) * torch.exp(similarity_matrix / self.temperature)
RuntimeError: The size of tensor a (64) must match the size of tensor b (48) at non-singleton dimension 1
Process finished with exit code 1
字符串
下面是一些代码(错误发生在最后一行):
train_config = Hparams()
reproducibility(train_config)
model = SimCLR_pl(train_config, model=resnet18(pretrained=False), feat_dim=512)
transform = Augment(train_config.img_size)
data_loader = get_stl_dataloader(train_config.batch_size, transform)
accumulator = GradientAccumulationScheduler(scheduling={0: train_config.gradient_accumulation_steps})
checkpoint_callback = ModelCheckpoint(filename=filename, dirpath=save_model_path, every_n_epochs=2,
save_last=True, save_top_k=2, monitor='Contrastive loss_epoch', mode='min')
trainer = Trainer(callbacks=[accumulator, checkpoint_callback],
gpus=available_gpus,
max_epochs=train_config.epochs)
trainer.fit(model, data_loader)
型
这是我的班级:
class Hparams:
def __init__(self):
self.epochs = 10 # number of training epochs
self.seed = 33333 # randomness seed
self.cuda = True # use nvidia gpu
self.img_size = 224 # image shape
self.save = "./saved_models/" # save checkpoint
self.load = False # load pretrained checkpoint
self.gradient_accumulation_steps = 5 # gradient accumulation steps
self.batch_size = 70
self.lr = 3e-4 # for ADAm only
self.weight_decay = 1e-6
self.embedding_size = 128 # papers value is 128
self.temperature = 0.5 # 0.1 or 0.5
self.checkpoint_path = '/media/wlutz/TOSHIBA EXT/Image Analysis/VIH PROJECT/models' # replace checkpoint path here
class SimCLR_pl(pl.LightningModule):
def __init__(self, config, model=None, feat_dim=512):
super().__init__()
self.config = config
self.model = AddProjection(config, model=model, mlp_dim=feat_dim)
self.loss = ContrastiveLoss(config.batch_size, temperature=self.config.temperature)
def forward(self, X):
return self.model(X)
def training_step(self, batch, batch_idx):
(x1, x2) = batch
z1 = self.model(x1)
z2 = self.model(x2)
loss = self.loss(z1, z2)
self.log('Contrastive loss', loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
return loss
def configure_optimizers(self):
max_epochs = int(self.config.epochs)
param_groups = define_param_groups(self.model, self.config.weight_decay, 'adam')
lr = self.config.lr
optimizer = Adam(param_groups, lr=lr, weight_decay=self.config.weight_decay)
print(f'Optimizer Adam, '
f'Learning Rate {lr}, '
f'Effective batch size {self.config.batch_size * self.config.gradient_accumulation_steps}')
scheduler_warmup = LinearWarmupCosineAnnealingLR(optimizer, warmup_epochs=10, max_epochs=max_epochs,
warmup_start_lr=0.0)
return [optimizer], [scheduler_warmup]
class AddProjection(nn.Module):
def __init__(self, config, model=None, mlp_dim=512):
super(AddProjection, self).__init__()
embedding_size = config.embedding_size
self.backbone = default(model, models.resnet18(pretrained=False, num_classes=config.embedding_size))
mlp_dim = default(mlp_dim, self.backbone.fc.in_features)
print('Dim MLP input:', mlp_dim)
self.backbone.fc = nn.Identity()
# add mlp projection head
self.projection = nn.Sequential(
nn.Linear(in_features=mlp_dim, out_features=mlp_dim),
nn.BatchNorm1d(mlp_dim),
nn.ReLU(),
nn.Linear(in_features=mlp_dim, out_features=embedding_size),
nn.BatchNorm1d(embedding_size),
)
def forward(self, x, return_embedding=False):
embedding = self.backbone(x)
if return_embedding:
return embedding
return self.projection(embedding)
class ContrastiveLoss(nn.Module):
"""
Vanilla Contrastive loss, also called InfoNceLoss as in SimCLR paper
"""
def __init__(self, batch_size, temperature=0.5):
super().__init__()
self.batch_size = batch_size
self.temperature = temperature
self.mask = (~torch.eye(batch_size * 2, batch_size * 2, dtype=bool)).float()
def calc_similarity_batch(self, a, b):
representations = torch.cat([a, b], dim=0)
similarity_matrix = F.cosine_similarity(representations.unsqueeze(1), representations.unsqueeze(0), dim=2)
return similarity_matrix
def forward(self, proj_1, proj_2):
"""
proj_1 and proj_2 are batched embeddings [batch, embedding_dim]
where corresponding indices are pairs
z_i, z_j in the SimCLR paper
"""
batch_size = proj_1.shape[0]
z_i = F.normalize(proj_1, p=2, dim=1)
z_j = F.normalize(proj_2, p=2, dim=1)
similarity_matrix = self.calc_similarity_batch(z_i, z_j)
sim_ij = torch.diag(similarity_matrix, batch_size)
sim_ji = torch.diag(similarity_matrix, -batch_size)
positives = torch.cat([sim_ij, sim_ji], dim=0)
nominator = torch.exp(positives / self.temperature)
# print(" sim matrix ", similarity_matrix.shape)
# print(" device ", device_as(self.mask, similarity_matrix).shape, " torch exp ", torch.exp(similarity_matrix / self.temperature).shape)
denominator = device_as(self.mask, similarity_matrix) * torch.exp(similarity_matrix / self.temperature)
all_losses = -torch.log(nominator / torch.sum(denominator, dim=1))
loss = torch.sum(all_losses) / (2 * self.batch_size)
return loss
class ImageDataResourceDataset(VisionDataset):
train_list = ['train_X_v1.bin', ]
test_list = ['test_X_v1.bin', ]
def __init__(self, root: str, transform: Optional[Callable] = None, ):
super().__init__(root=root, transform=transform)
self.data = self.__loadfile(self.train_list[0])
def __len__(self) -> int:
return self.data.shape[0]
def __getitem__(self, idx):
img = self.data[idx]
img = np.transpose(img, (1, 2, 0))
img = Image.fromarray(img)
img = self.transform(img)
return img
def __loadfile(self, data_file: str) -> np.ndarray:
path_to_data = os.path.join(os.getcwd(), 'datasets', data_file)
everything = np.fromfile(path_to_data, dtype=np.uint8)
images = np.reshape(everything, (-1, 3, 224, 224))
images = np.transpose(images, (0, 1, 3, 2))
return images
型
为了记录,我的数据集有7000张大小为224 x224的RGB图像。
为什么我的最后一批“分数”不支持?非常感谢您的帮助。
1条答案
按热度按时间xkrw2x1b1#
我在GitHub存储库中找到了原始源代码:https://github.com/The-AI-Summer/simclr/blob/main/AI_Summer_SimCLR_Resnet18_STL10.ipynb
根据您提供的错误消息:
字符串
问题似乎源于这行代码:
型
在
class ContrastiveLoss(nn.Module)
中。为了研究这个类的问题,让我们首先检查它的初始化变量:
型
错误似乎与
batch_size
有关。我们需要确定代码的哪一部分调用了ContrastiveLoss()
。现在,让我们定位
class SimCLR_pl(pl.LightningModule)
中使用ContrastiveLoss()
的位置:型
问题出在
config.batch_size
上,ContrastiveLoss()
的定义依赖于config.batch_size
。您正在使用批大小为32的7000个数据点。因此,最终迭代的批大小为7000% 32 = 24。
因为
__init__
函数利用:型
函数期望大小为
config.batch_size
* 2 = 64。但是,您现在的大小为24 * 2 = 48。这种差异是错误消息的原因:型
要解决此问题,您应该确保batch_size的大小与
ContrastiveLoss()
类的预期正确对齐。