如何在Python中使用机器学习通过Pandas Dataframe预测二元结果

iklwldmw  于 2023-02-07  发布在  Python
关注(0)|答案(1)|浏览(190)

下面的代码:

import nfl_data_py as nfl
pbp = nfl.import_pbp_data([2022], downcast=True, cache=False, alt_path=None)

它返回2022年NFL赛季发生的每一场比赛的 Dataframe 。我想训练它的列是score_differentialyardline_100ydstogodownhalf_seconds_remaining,以预测play_type-runpass
例如:我给它一个-4的比分差,25码线,第4次进攻,还有16码,还有300秒半--它会返回它从 Dataframe 中学到的任何东西,可能是pass
我该如何着手做这件事呢?我应该使用scickeylearn决策树吗?

z9smfwbn

z9smfwbn1#

给你:

import nfl_data_py as nfl
import pandas as pd
#import train_test_split
from sklearn.model_selection import train_test_split
#we need to encode the play_type column
from sklearn.preprocessing import LabelEncoder 
#import the model
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

pbp = nfl.import_pbp_data([2022], downcast=True, cache=False, alt_path=None)
df = pd.DataFrame(pbp)
#there are definitely other features you can use, but these are the ones you want.
df = df[['score_differential', 'yardline_100', 'ydstogo', 'down', 'half_seconds_remaining', 'play_type']]
df = df.dropna()
# drop the rows which are 'None', 'No_play'
df = df[df['play_type'] != 'None']
df = df[df['play_type'] != 'no_play']
#reset the index
df = df.reset_index(drop=True)
#encode the play_type column
le = LabelEncoder()
df['play_type_encode'] = le.fit_transform(df['play_type'])
# train test split
X_train, X_test, y_train, y_test = train_test_split(df.drop(['play_type', 'play_type_encode'], axis=1), df['play_type_encode'], test_size=0.3, random_state=42)
#instantiate the model
rfc = RandomForestClassifier(n_estimators=100)
#fit the model
rfc.fit(X_train, y_train)
#predict the model
rfc_pred = rfc.predict(X_test)
#evaluate the model
print(classification_report(y_test, rfc_pred))
#plot the confusion matrix
plt.figure(figsize=(10,6))
sns.heatmap(confusion_matrix(y_test, rfc_pred), annot=True)
plt.xlabel('Predicted')
plt.ylabel('True')

相关问题