当我将数据从pandas dataframe转换到sklearn以便进行预测时。字符串数据变得有问题。所以我使用了labelencoder,但它似乎限制我使用编码数据而不是源字符串数据。
在sklearn的predict方法中,我想对这个输入进行预测:
learn_to_machine=dtc.fit(X,Y)
test=[
[128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
[512, 8, 65, 'mobile_phone', 'Huawei',5000]
]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000
而不是这个
learn_to_machine=dtc.fit(X,Y)
test=[
[128, 6 ,50, 0, 2, 6000],
[512, 8, 65, 0, 3,5000]
]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000
如果有帮助的话,这里是我所有的代码:
import sqlalchemy
import pandas as pd
read_engine=sqlalchemy.create_engine('mysql+mysqlconnector://root:@localhost/six')
conn = read_engine.connect()
df_new=pd.read_sql_table('mobile1' ,con= conn )
df_new['price']=df_new['price'].astype(int)
df_new['ram']=df_new['ram'].astype(int)
df_new['battery']=df_new['battery'].astype(int)
df_new['size']=df_new['size'].astype(float)
df_new['camera']=df_new['camera'].mask(df_new['camera'] == '')
df_new['camera']=df_new['camera'].mask(df_new['camera'] == ' ')
df_new['camera']=df_new['camera'].mask(df_new['camera'] == ' ')
df_new['camera']=df_new['camera'].fillna(0)
df_new['camera']=df_new['camera'].astype(float)
X=df_new[['ram','size','camera','product','Brand','battery']].values
Y=df_new[['price']].values
from sklearn import preprocessing
product_enc=preprocessing.LabelEncoder()
product_enc.fit([char for char in X[:,4]])
X[:,4]=product_enc.transform(X[:,4])
product_enc.fit([ char for char in X[:,3]])
X[:,3]=product_enc.transform(X[:,3])
from sklearn import tree
dtc=tree.DecisionTreeClassifier()
learn_to_machine=dtc.fit(X,Y)
# when i execute with this its ok
test=[
[128, 6 ,50, 0, 2, 6000],
[512, 8, 65, 0, 3,5000]
]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000
当我尝试执行达特的时候:
test=[
[128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
[512, 8, 65, 'mobile_phone', 'Huawei',5000]
]
此错误引发:ValueError: could not convert string to float: 'mobile_phone'
1条答案
按热度按时间hmtdttj41#
首先,你可能应该改变你的两个不同的labelencoder有2个不同的名称-
然后您可以自动转换新的原始数据