如何从由LOO编码器、OHE和顺序编码器组成的ColumnTransformer中生成DataFrame?

rpppsulh  于 2022-10-23  发布在  其他
关注(0)|答案(1)|浏览(271)

由于Sklearn中的“原生”类别编码器不推荐使用get_feature_names()功能(实际上它被get_feature_names_out()取代),我如何制作一个DataFrame,其中转换后的变量有自己的专有名称,因为ColumnTransformer中的编码器响应get_feature_names_out(),而其他编码器响应mn3o1p?情况如下:

  1. features_pipe = make_column_transformer(
  2. (OneHotEncoder(handle_unknown = 'ignore', sparse=False), ['Gender', 'Race']),
  3. (OrdinalEncoder(), ['Age', 'Overall Work Exp.', 'Fieldwork Exp.', 'Level of Education']),
  4. (ce.LeaveOneOutEncoder(), ['State (US)'])
  5. ).fit(X_train, y_train)
  6. X_train_encoded = features_pipe.transform(X_train)
  7. X_test_encoded = features_pipe.transform(X_test)
  8. X_train_encoded_df = pd.DataFrame(X_train_encoded, columns= features_pipe.get_features_names_out())
  9. X_train_encoded_df.head()
  10. I got this error: AttributeError: 'ColumnTransformer' object has no attribute 'get_features_names_out'

这是因为LeaveOneOutEncoder不支持get_feature_names_out()。它支持m1n 5o1p。
如何克服此问题并正确打印DataFrame?

zour9fqk

zour9fqk1#

我过去也有过同样的问题。
如果您不介意使用ColumnTransformer的子类,可以创建它,并在get_feature_names_out()不可用时修改为调用get_feature_names()
在这种情况下,您应该声明类

  1. from sklearn.compose import ColumnTransformer
  2. from sklearn.compose._column_transformer import _is_empty_column_selection
  3. class MyColumnTransformer(ColumnTransformer):
  4. def __init__(self, transformers,**kwargs):
  5. super().__init__(transformers=transformers,**kwargs)
  6. def _get_feature_name_out_for_transformer(
  7. self, name, trans, column, feature_names_in
  8. ):
  9. column_indices = self._transformer_to_input_indices[name]
  10. names = feature_names_in[column_indices]
  11. if trans == "drop" or _is_empty_column_selection(column):
  12. return
  13. elif trans == "passthrough":
  14. return names
  15. if not hasattr(trans, "get_feature_names_out"):
  16. return trans.get_feature_names()
  17. return trans.get_feature_names_out(names)

尽管ColumnTransformer的使用不像make_column_transformer那么简单,但它更易于定制。
因此,在这种情况下,还必须使用以下模式将名称传递给每个变压器:

  • (名称、变压器、列)*
  1. features_pipe = MyColumnTransformer(transformers=
  2. [
  3. ('OHE', OneHotEncoder(handle_unknown = 'ignore', sparse=False), ['Gender', 'Race']),
  4. ('OE', OrdinalEncoder(), ['Age', 'Overall Work Exp.', 'Fieldwork Exp.', 'Level of Education']),
  5. ('LOOE', ce.LeaveOneOutEncoder(), ['State (US)'])
  6. ])
  7. features_pipe.fit(X_train, y_train)

最后按照您建议的方式继续执行代码。
如果您不想将变换器名称附加到功能名称,只需在初始化MyColumnTransformer时包含verbose_feature_names_out=False

展开查看全部

相关问题