如何将预测值合并到原始Pandas测试数据框中，其中X_test已在拆分前使用CountVectorizer进行了转换

ztmd8pv5 于 2023-02-02 发布在其他

关注(0)|答案(1)|浏览(84)

我想把我的测试数据的预测结果合并到我的X_test中。我可以把它和y_test合并，但是因为我的X_test是一个语料库，我不确定我如何识别要合并的索引。我的代码如下

def lr_model(df):

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    import pandas as pd
   
    # Create corpus as a list
    corpus = df['text'].tolist()
    cv = CountVectorizer()
    X = cv.fit_transform(corpus).toarray()
    y = df.iloc[:, -1].values

    # Splitting to testing and training
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

    # Train Logistic Regression on Training set
    classifier = LogisticRegression(random_state = 0)
    classifier.fit(X_train, y_train)

    # Predicting the Test set results
    y_pred = classifier.predict(X_test)

    # Merge true vs predicted labels
    true_vs_pred = pd.DataFrame(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

    return true_vs_pred

这给了我y_test和y_pred，但我不确定如何将X_test作为原始 Dataframe （X_test的id）添加到其中。任何指导都非常感谢。谢谢

pandas

来源：https://stackoverflow.com/questions/75282891/how-to-merge-predicted-values-to-original-pandas-test-data-frame-where-x-test-ha

1条答案

按热度按时间

e7arh2l61#

使用管道可以帮助您将原始X_test与预测链接起来：

def lr_model(df):

    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    import pandas as pd
    from sklearn.pipeline import Pipeline

    # Defining X and y
    cv = CountVectorizer()
    X = df['text']
    y = df.iloc[:, -1].values

    # Splitting to testing and training
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

    # Create a pipeline
    pipeline = Pipeline([
        ('CountVectorizer', cv),
        ('LogisticRegression', LogisticRegression(random_state = 0)),
    ])

    # Train pipeline on Training set
    pipeline.fit(X_train, y_train)

    # Predicting the Test set results
    y_pred = pipeline.predict(X_test)

    # Merge true vs predicted labels
    true_vs_pred = pd.DataFrame(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

    return true_vs_pred, X_test

赞(0）回复(0）举报 2023-02-02

我来回答

如何将预测值合并到原始Pandas测试数据框中，其中X_test已在拆分前使用CountVectorizer进行了转换

1条答案

相关问题

热门标签

最新问答