为什么PyTorch中的足球预测NN不起作用？

bzzcjhmw 于 2023-10-20 发布在其他

关注(0)|答案(1)|浏览(117)

我试图建立一个预测足球比分的模型。
我使用了这个数据集：https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017
然后，我给每个团队名称一个唯一的整数。
下面是NN的代码：

import csv
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
import pandas as pd

data = []
with open('/Volumes/Drive 2/Football nn/output.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        data.append(row)

# team names
home_team = [str(row[1]) for row in data]
away_team = [str(row[2]) for row in data]
#team scores
old_home_score = [str(row[3]) for row in data]
old_away_score = [str(row[4]) for row in data]

home_score = []
away_score = []
print(home_team[44762])
print(away_team[44762])
home_team.pop(0)
away_team.pop(0)

old_home_score.pop(0)
old_away_score.pop(0)

for item in (old_home_score):
    iteam = int(item)
    iteam /=10
    home_score.append(iteam)

for item in (old_away_score):
    iteam = int(item)
    iteam /=10
    away_score.append(iteam)

print(away_score[44761])
print(home_score[44761])
home_team = [eval(i) for i in home_team]
away_team = [eval(i) for i in away_team]
class SoccerNN(nn.Module):
    def __init__(self):
        super(SoccerNN, self).__init__()
        self.fc1 = nn.Linear(2, 15)  
        self.fc2 = nn.Linear(15, 20)
        self.fc3 = nn.Linear(20, 10)
        self.fc4 = nn.Linear(10, 2) 

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        x = torch.sigmoid(x)
        return x

# Convert the columns to PyTorch tensors
input_data = torch.tensor(np.column_stack((home_team, away_team)), dtype=torch.int16)
output_data = torch.tensor(np.column_stack((home_score, away_score)), dtype=torch.int16)

model = SoccerNN()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
epochs = 50  # Adjust the number of epochs as needed
batch_size = 10  # Adjust the batch size as needed
for epoch in range(epochs):
    for batch_start in range(0, len(input_data), batch_size):
        batch_end = batch_start + batch_size
        batch_input = input_data[batch_start:batch_end]
        batch_output = output_data[batch_start:batch_end]
        # Convert batch_input and batch_output to the same dtype as the model's parameters
        batch_input = batch_input.to(model.fc1.weight.dtype)
        batch_output = batch_output.to(model.fc2.weight.dtype)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        predictions = model(batch_input)

        # Calculate loss
        loss = criterion(predictions, batch_output)

        # Backpropagation
        loss.backward()

        # Update weights
        optimizer.step()

    
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

# After training, you can use the model for predictions
home_team = [16]
away_team = [37]
new_data = torch.tensor(np.column_stack((home_team, away_team)), dtype=torch.int16)
new_data = torch.tensor(new_data, dtype=torch.float32)  # Use the appropriate data type

# Put the model in evaluation mode
model.eval()

# Make predictions
with torch.no_grad():
    predictions = model(new_data)

# Convert predictions to numpy array
predictions_np = predictions.numpy()

# Print or use the predictions as needed
print(predictions_np)

我给网络两个int作为输入（每个int代表一个团队），网络应该输出游戏的预测得分。
当我训练模型时，它说损失为0，但预测完全错误。
为什么它不工作（我的猜测是，它进入了一个局部最小值，它不能出去，但我可能完全错了，因为我是非常新的整个事情），我应该如何改变模型，使它工作？
已尝试更改批处理大小、层数、时期.
还是不行...

pytorch

来源：https://stackoverflow.com/questions/77002369/why-doesnt-my-football-prediction-nn-in-pytorch-work

1条答案

按热度按时间

tpgth1q71#

首先，当我试图编译你的代码时，我在home_team = [eval(i) for i in home_team]行得到一个错误，我怀疑你想把团队Map到整数，即：

NameError: name 'Scotland' is not defined

我将首先解释如何正确地进行这种Map，尽管这不是将类（团队）馈送到NN中的正确方法，我将在后面解释。

将团队Map为整数

可以使用以下代码：

string_to_int = {}
next_int = 0  # Initialize the integer counter
# Iterate through the list of strings
for string in home_team + away_team:
    # Check if the string is already in the dictionary
    if string not in string_to_int:
        string_to_int[string] = next_int
        next_int += 1  # Increment the integer counter

home_team = [string_to_int[s] for s in home_team]
away_team = [string_to_int[s] for s in away_team]

有了这个，我能够运行你的代码，并得到了大约0.014的损失。

如何提高模型性能

只有当类之间存在某种关系时，将类Map到整数才有意义，例如：一个命令在这种情况下，它们不会，但是通过将它们作为int提供给NN，您可以显式地告诉NN，例如，Uruguay（5）接近Austria（6），而远离Surrey（314）。然后，NN必须首先了解这种关系并不意味着什么，这使得它的工作更加困难。
更好的选择是使用One Hot Encoding或Learnt Encoding（参见tutorial）
现在你将有一个更宽的网络，因为输入将有唯一类的数量的维度（这里~300）
添加非线性函数（例如nn.ReLU()或nn.Sigmoid()）。目前，整个网络相当于一个线性fc层和一个sigmoid层。
这是因为线性函数的合成仍然是线性的（参见例如，this math se question）
您将分数除以10，以确保它们符合网络输出[-1,1]的范围。更明智的做法是将最终的非线性度（在时刻sigmoid）更改为范围为目标数据范围的值，即[0,infinity)，例如ReLU函数。

赞(0）回复(0）举报 2023-10-20

我来回答

为什么PyTorch中的足球预测NN不起作用？

1条答案

将团队Map为整数

如何提高模型性能

相关问题

热门标签

最新问答