matplotlib XGBoost plot_importance()中的图号格式

wlsrxk51 于 2023-11-22 发布在其他

关注(0)|答案(4)|浏览(128)

我已经训练了一个XGBoost模型，并使用plot_importance（）来绘制训练模型中最重要的特征。尽管plot中的数字有几个十进制值，这些值会淹没图，并且不适合图。
我已经搜索了绘图格式选项，但我只找到了如何格式化轴（尝试格式化X轴，希望它也会格式化相应的轴）
我使用的是一个自动笔记本（如果这有什么区别的话）。代码如下：

xg_reg = xgb.XGBClassifier(
                objective = 'binary:logistic',
                colsample_bytree = 0.4,
                learning_rate = 0.01,
                max_depth = 15, 
                alpha = 0.1, 
                n_estimators = 5,
                subsample = 0.5,
                scale_pos_weight = 4
                )
xg_reg.fit(X_train, y_train) 
preds = xg_reg.predict(X_test)

ax = xgb.plot_importance(xg_reg, max_num_features=3, importance_type='gain', show_values=True) 

fig = ax.figure
fig.set_size_inches(10, 3)

字符串
我是否遗漏了什么？是否有任何格式函数或参数要传递？
我希望能够格式化功能的重要性分数，或至少删除小数部分（例如“25”而不是“25.66521”）。
xgboost_feature_importance_scores

matplotlib

来源：https://stackoverflow.com/questions/56061712/plot-number-formatting-in-xgboost-plot-importance

4条答案

按热度按时间

6rqinv9w1#

不需要编辑xgboost绘图函数就可以得到你想要的结果。绘图函数可以接受一个重要性字典作为它的第一个参数，你可以直接从你的xgboost模型中创建，然后编辑。如果你想为功能名称制作更友好的标签，这也很方便。

# Get the booster from the xgbmodel
booster = xg_reg.get_booster()

# Get the importance dictionary (by gain) from the booster
importance = booster.get_score(importance_type="gain")

# make your changes
for key in importance.keys():
    importance[key] = round(importance[key],2)

# provide the importance dictionary to the plotting function
ax = plot_importance(importance, max_num_features=3, importance_type='gain', show_values=True)

字符串

赞(0）回复(0）举报 2023-11-22

llmtgqce2#

我刚解决了同样的问题。
它的发生只是因为对于“增益”或“覆盖”的数字包含太多的浮动数字相反的“重量”选项。不幸的是，据我所知，没有指定位数的选项。因此，我自己修改了函数，以指定允许的最大位数。下面是在plotting.py中执行的修改xgboost包的文件。如果你正在使用spider控制台，你可以通过指定一个错误的选项（我是一个懒惰的家伙）来找到并打开文件，例如：

xgb.plot_importance(xg_reg, potato=False)

字符串
然后在控制台中的Error中单击文件。下一步是修改函数本身，如下所示：

def plot_importance(booster, ax=None, height=0.2,
                    xlim=None, ylim=None, title='Feature importance',
                    xlabel='F score', ylabel='Features',
                    importance_type='weight', max_num_features=None,
                    grid=True, show_values=True, max_digits=3, **kwargs):

型
然后你还应该在show_values条件之前添加：

if max_digits is not None:
    t = values
    lst = list(t)
    if len(str(lst[0]).split('.')[-1])>max_digits:
        values_displayed = tuple([('{:.'+str(max_digits)+'f}').format(x) for x in lst])
    else:
        values_displayed = values

if show_values is True:
    for x, x2, y in zip(values, values_displayed, ylocs):
        ax.text(x + 1, y, x2, va='center')

型
我执行了一个条件，只格式化数字，如果后者比指定的位数长。它避免了例如importance_type ='weight'选项产生不需要的数字。
请注意，对于'cover'和'gain'，文本对我来说也是不好的位置，因此我也修改了移位，并将上面的1替换为：

if show_values is True:
    for x, x2, y in zip(values, values_displayed, ylocs):
         dx = np.max(values)/100
         ax.text(x + dx, y, x2, va='center')

型
希望对你有帮助！

赞(0）回复(0）举报 2023-11-22

mnemlml83#

编辑xgboost包中plotting.py的代码：

86 ylocs = np.arange(len(values))
87 values=tuple([round(x,4) for x in values])
88 ax.barh(ylocs, values, align='center', height=height, **kwargs)

字符串

的数据

赞(0）回复(0）举报 2023-11-22

6g8kf2rb4#

这个老问题的一个更简单的答案是基于plotting.py关于values_format的文档：

values_format：值的格式字符串。“v”将被替换为特征重要性的值。例如，传递“{v：.2f}”，以便将小数点后的位数限制为两位，对于图形上打印的每个值。*

赞(0）回复(0）举报 2023-11-22

我来回答

matplotlib XGBoost plot_importance()中的图号格式

4条答案

相关问题

热门标签

最新问答