我遇到了一个问题,谷歌Colab的内存正在耗尽。我使用免费版本,我不确定这是因为它不能处理或如果我的代码是非常糟糕的优化。因为我是新的领域,我相信我的代码是非常缓慢和糟糕的优化。想寻求一点帮助,因为我还在学习。
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from xgboost import XGBRegressor
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('path/beforeNeural.csv')
df.shape
df.head()
df.isnull().sum()
encoder = LabelEncoder()
df['Property Type'] = encoder.fit_transform(df['Property Type'])
df['Old/New'] = encoder.fit_transform(df['Old/New'])
df['Record Status - monthly file only'] = encoder.fit_transform(df['Record Status - monthly file only'])
df['PPDCategory Type'] = encoder.fit_transform(df['PPDCategory Type'])
df['County'] = encoder.fit_transform(df['County'])
df['District'] = encoder.fit_transform(df['District'])
df['Town/City'] = encoder.fit_transform(df['Town/City'])
df['Duration'] = encoder.fit_transform(df['Duration'])
df['Transaction unique identifier'] = encoder.fit_transform(df['Transaction unique identifier'])
df['Date of Transfer'] = encoder.fit_transform(df['Date of Transfer'])
X = df.drop(columns='Price', axis=1)
Y = df['Price']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=2)
df.shape
boostenc = XGBRegressor()
boostenc.fit(X_train, Y_train)
1条答案
按热度按时间lxkprmvk1#
我将给予一下,这里有一个可能的选项来优化您的代码,
代码:
注意,我删除了未使用的导入,并删除了代码中间的
df.head()
等调用,当您在代码中间这样使用它时,它什么也不做,也不打印任何内容代码解释:
1.我没有使用
LabelEncoder
,而是使用OneHotEncoder
对所有分类特征进行one-hot编码。这会为分类特征中的每个唯一值创建一个新的二进制列。一般来说,在使用机器学习时,one-hot编码通常是处理分类特征的更好方法,而不仅仅是使用LabelEncoder
分配整数值。1.我将所有分类列的名称提取到一个列表中,这样在需要时更容易修改它们。