我正在关注this tutorial。
然而,我决定把线性层,然后使他们转换成1 * 19 * 19的图像。这样做,我得到了一堆像素随机的地方。x1c 0d1x
下面是我修改后的代码。为了描述我所做的,我基本上是从0-10剪切标签,然后从10剪切图片数组的长度。这样我就把标签和混乱的图片分开了。
import torch
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
batch_size = 4
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
shuffle=False)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5,8 * 7 * 7) # 8 * 7 * 7
self.fc2 = nn.Linear(8 * 7 * 7, 6 * 8 * 8) # 6 * 8 * 8
self.fc3 = nn.Linear(6 * 8 * 8,19 * 19 + 10) # 19 * 19 + 10
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
#### IMPORTANT: this here is where I extract the picture of this layer, comparing against the
# regression layers and this here!
self.picx = self.pool(F.relu(self.conv2(x)))
####
x2 = torch.flatten(self.picx, 1) # flatten all dimensions except batch
x2 = F.relu(self.fc1(x2))
x2 = F.relu(self.fc2(x2))
x2 = self.fc3(x2)
return x2
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(5): # loop over the dataset multiple times
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs).to(device)
loss = criterion(outputs, labels).to(device)
loss.backward()
optimizer.step()
running_loss = loss.item()
with torch.no_grad():
dataiter = iter(testloader)
images, labels = dataiter.next()
outputs = net.forward(images) ### pic = to the extracted convulotional layer, vs the regression layer.
_, predicted = torch.max(outputs[...,0:10], 1)
print(predicted, labels)
preds = torch.reshape(outputs[...,10:1454], (4,19,19))
plt.imshow(preds[0].detach().numpy())
plt.show()
correct = 0
total = 0
# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
for data in testloader:
images, labels = data
# calculate outputs by running images through the network
outputs = net(images)
# the class with the highest energy is what we choose as prediction
_, predicted = torch.max(outputs[...,0:10], 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')
这张图像意味着什么?有没有什么方法可以得到一个遮罩,让像素在哪里检测到标签?换句话说,我想有一张图像,它在哪里绘制像素,它正在看到一个主题,如狗或猫。
1条答案
按热度按时间c3frrgcw1#
我不想打击你,但这是一个形象当你把
Conv2d
层的输出变平,并把这个输出传递给2个Linear
层时,你就失去了对神经元的任何空间意义。“线性”或“密集”层把上一层的每个节点都连接到下一层的每个节点,从而有效地丢弃了原始输入图像中的任何神经元/节点与场所之间的关系。如果你想查看你的网络所关注的图像部分以便做出决定,你会想查看卷积层的内部。这是一个有很多有效方法的重要问题。一个流行的方法是Grad-CAM。如果你想要更简单的方法,您可以尝试将每个通道分别标绘为卷积层之一的输出;但即使这样也很难解释。