如何从由LOO编码器、OHE和顺序编码器组成的ColumnTransformer中生成DataFrame?

rpppsulh  于 2022-10-23  发布在  其他
关注(0)|答案(1)|浏览(232)

由于Sklearn中的“原生”类别编码器不推荐使用get_feature_names()功能(实际上它被get_feature_names_out()取代),我如何制作一个DataFrame,其中转换后的变量有自己的专有名称,因为ColumnTransformer中的编码器响应get_feature_names_out(),而其他编码器响应mn3o1p?情况如下:

features_pipe = make_column_transformer(
    (OneHotEncoder(handle_unknown = 'ignore', sparse=False), ['Gender', 'Race']),
    (OrdinalEncoder(), ['Age', 'Overall Work Exp.', 'Fieldwork Exp.', 'Level of Education']),
    (ce.LeaveOneOutEncoder(), ['State (US)'])
                                        ).fit(X_train, y_train)

X_train_encoded = features_pipe.transform(X_train)
X_test_encoded = features_pipe.transform(X_test)

X_train_encoded_df = pd.DataFrame(X_train_encoded, columns= features_pipe.get_features_names_out())
X_train_encoded_df.head()
I got this error: AttributeError: 'ColumnTransformer' object has no attribute 'get_features_names_out'

这是因为LeaveOneOutEncoder不支持get_feature_names_out()。它支持m1n 5o1p。
如何克服此问题并正确打印DataFrame?

zour9fqk

zour9fqk1#

我过去也有过同样的问题。
如果您不介意使用ColumnTransformer的子类,可以创建它,并在get_feature_names_out()不可用时修改为调用get_feature_names()
在这种情况下,您应该声明类

from sklearn.compose import ColumnTransformer
from sklearn.compose._column_transformer import _is_empty_column_selection

class MyColumnTransformer(ColumnTransformer):
    def __init__(self, transformers,**kwargs):
        super().__init__(transformers=transformers,**kwargs)

    def _get_feature_name_out_for_transformer(
        self, name, trans, column, feature_names_in
    ):
        column_indices = self._transformer_to_input_indices[name]
        names = feature_names_in[column_indices]
        if trans == "drop" or _is_empty_column_selection(column):
            return
        elif trans == "passthrough":
            return names

        if not hasattr(trans, "get_feature_names_out"):
            return trans.get_feature_names()
        return trans.get_feature_names_out(names)

尽管ColumnTransformer的使用不像make_column_transformer那么简单,但它更易于定制。
因此,在这种情况下,还必须使用以下模式将名称传递给每个变压器:

  • (名称、变压器、列)*
features_pipe = MyColumnTransformer(transformers=
    [
      ('OHE', OneHotEncoder(handle_unknown = 'ignore', sparse=False), ['Gender', 'Race']),
      ('OE', OrdinalEncoder(), ['Age', 'Overall Work Exp.', 'Fieldwork Exp.', 'Level of Education']),
      ('LOOE', ce.LeaveOneOutEncoder(), ['State (US)'])
    ])
features_pipe.fit(X_train, y_train)

最后按照您建议的方式继续执行代码。
如果您不想将变换器名称附加到功能名称,只需在初始化MyColumnTransformer时包含verbose_feature_names_out=False

相关问题