python 如何在sklearn中从字符串数据中获取predict

cbwuti44  于 2023-03-28  发布在  Python
关注(0)|答案(1)|浏览(100)

当我将数据从pandas dataframe转换到sklearn以便进行预测时。字符串数据变得有问题。所以我使用了labelencoder,但它似乎限制我使用编码数据而不是源字符串数据。
在sklearn的predict方法中,我想对这个输入进行预测:

learn_to_machine=dtc.fit(X,Y)
test=[
    [128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
    [512, 8, 65, 'mobile_phone', 'Huawei',5000]
        ]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000

而不是这个

learn_to_machine=dtc.fit(X,Y)
test=[
    [128, 6 ,50, 0, 2, 6000],
    [512, 8, 65, 0, 3,5000]
        ]
answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000

如果有帮助的话,这里是我所有的代码:

import sqlalchemy
import pandas as pd
read_engine=sqlalchemy.create_engine('mysql+mysqlconnector://root:@localhost/six')
conn = read_engine.connect()
df_new=pd.read_sql_table('mobile1' ,con= conn )
df_new['price']=df_new['price'].astype(int)
df_new['ram']=df_new['ram'].astype(int)
df_new['battery']=df_new['battery'].astype(int)
df_new['size']=df_new['size'].astype(float)
df_new['camera']=df_new['camera'].mask(df_new['camera'] == '')
df_new['camera']=df_new['camera'].mask(df_new['camera'] == ' ')
df_new['camera']=df_new['camera'].mask(df_new['camera'] == '  ')
df_new['camera']=df_new['camera'].fillna(0)
df_new['camera']=df_new['camera'].astype(float)

X=df_new[['ram','size','camera','product','Brand','battery']].values
Y=df_new[['price']].values

from sklearn import preprocessing
product_enc=preprocessing.LabelEncoder()
product_enc.fit([char for char in X[:,4]])
X[:,4]=product_enc.transform(X[:,4])
product_enc.fit([ char for char in X[:,3]])
X[:,3]=product_enc.transform(X[:,3])
from sklearn import tree
dtc=tree.DecisionTreeClassifier()
learn_to_machine=dtc.fit(X,Y)

# when i execute with this its ok
test=[
    [128, 6 ,50, 0, 2, 6000],
    [512, 8, 65, 0, 3,5000]
        ]

answer=learn_to_machine.predict(test)
print(answer[0])
print(answer[1])
# 11399000
# 15304000

当我尝试执行达特的时候:

test=[
    [128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
    [512, 8, 65, 'mobile_phone', 'Huawei',5000]
        ]

此错误引发:ValueError: could not convert string to float: 'mobile_phone'

hmtdttj4

hmtdttj41#

首先,你可能应该改变你的两个不同的labelencoder有2个不同的名称-

product_enc=preprocessing.LabelEncoder()
product_enc.fit([char for char in X[:,3]])
X[:,3]=product_enc.transform(X[:,3])

company_enc=preprocessing.LabelEncoder()
company_enc.fit([ char for char in X[:,4]])
X[:,4]=company_enc.transform(X[:,4])

然后您可以自动转换新的原始数据

test=[
    [128, 6 ,50, 'mobile_phone', 'Samsung', 6000],
    [512, 8, 65, 'mobile_phone', 'Huawei',5000]
        ]
test_transform = test
test_transform[:,3] = product_enc.transform(test[:,3])
test_transform[:,4] = company_enc.transform(test[:,4])

answer=learn_to_machine.predict(test_transform)

相关问题