我试图建立一个预测足球比分的模型。
我使用了这个数据集:https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017
然后,我给每个团队名称一个唯一的整数。
下面是NN的代码:
import csv
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
import pandas as pd
data = []
with open('/Volumes/Drive 2/Football nn/output.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
# team names
home_team = [str(row[1]) for row in data]
away_team = [str(row[2]) for row in data]
#team scores
old_home_score = [str(row[3]) for row in data]
old_away_score = [str(row[4]) for row in data]
home_score = []
away_score = []
print(home_team[44762])
print(away_team[44762])
home_team.pop(0)
away_team.pop(0)
old_home_score.pop(0)
old_away_score.pop(0)
for item in (old_home_score):
iteam = int(item)
iteam /=10
home_score.append(iteam)
for item in (old_away_score):
iteam = int(item)
iteam /=10
away_score.append(iteam)
print(away_score[44761])
print(home_score[44761])
home_team = [eval(i) for i in home_team]
away_team = [eval(i) for i in away_team]
class SoccerNN(nn.Module):
def __init__(self):
super(SoccerNN, self).__init__()
self.fc1 = nn.Linear(2, 15)
self.fc2 = nn.Linear(15, 20)
self.fc3 = nn.Linear(20, 10)
self.fc4 = nn.Linear(10, 2)
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
x = self.fc4(x)
x = torch.sigmoid(x)
return x
# Convert the columns to PyTorch tensors
input_data = torch.tensor(np.column_stack((home_team, away_team)), dtype=torch.int16)
output_data = torch.tensor(np.column_stack((home_score, away_score)), dtype=torch.int16)
model = SoccerNN()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training loop
epochs = 50 # Adjust the number of epochs as needed
batch_size = 10 # Adjust the batch size as needed
for epoch in range(epochs):
for batch_start in range(0, len(input_data), batch_size):
batch_end = batch_start + batch_size
batch_input = input_data[batch_start:batch_end]
batch_output = output_data[batch_start:batch_end]
# Convert batch_input and batch_output to the same dtype as the model's parameters
batch_input = batch_input.to(model.fc1.weight.dtype)
batch_output = batch_output.to(model.fc2.weight.dtype)
# Zero the gradients
optimizer.zero_grad()
# Forward pass
predictions = model(batch_input)
# Calculate loss
loss = criterion(predictions, batch_output)
# Backpropagation
loss.backward()
# Update weights
optimizer.step()
print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")
# After training, you can use the model for predictions
home_team = [16]
away_team = [37]
new_data = torch.tensor(np.column_stack((home_team, away_team)), dtype=torch.int16)
new_data = torch.tensor(new_data, dtype=torch.float32) # Use the appropriate data type
# Put the model in evaluation mode
model.eval()
# Make predictions
with torch.no_grad():
predictions = model(new_data)
# Convert predictions to numpy array
predictions_np = predictions.numpy()
# Print or use the predictions as needed
print(predictions_np)
我给网络两个int作为输入(每个int代表一个团队),网络应该输出游戏的预测得分。
当我训练模型时,它说损失为0,但预测完全错误。
为什么它不工作(我的猜测是,它进入了一个局部最小值,它不能出去,但我可能完全错了,因为我是非常新的整个事情),我应该如何改变模型,使它工作?
已尝试更改批处理大小、层数、时期.
还是不行...
1条答案
按热度按时间tpgth1q71#
首先,当我试图编译你的代码时,我在
home_team = [eval(i) for i in home_team]
行得到一个错误,我怀疑你想把团队Map到整数,即:我将首先解释如何正确地进行这种Map,尽管这不是将类(团队)馈送到NN中的正确方法,我将在后面解释。
将团队Map为整数
可以使用以下代码:
有了这个,我能够运行你的代码,并得到了大约
0.014
的损失。如何提高模型性能
int
提供给NN,您可以显式地告诉NN,例如,Uruguay
(5)接近Austria
(6),而远离Surrey
(314)。然后,NN必须首先了解这种关系并不意味着什么,这使得它的工作更加困难。nn.ReLU()
或nn.Sigmoid()
)。目前,整个网络相当于一个线性fc
层和一个sigmoid
层。10
,以确保它们符合网络输出[-1,1]
的范围。更明智的做法是将最终的非线性度(在时刻sigmoid
)更改为范围为目标数据范围的值,即[0,infinity)
,例如ReLU
函数。