我正在尝试使用xgboost4j flink包训练xgboost模型。当flink parallelism设置为1时,它可以工作,但当parallelism>1时,它总是抛出以下错误:
ml.dmlc.xgboost4j.java.XGBoostError: [11:13:50] /xgboost/src/objective/regression_obj.cu:64: Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty
Stack trace:
[bt] (0) /tmp/libxgboost4j2929418756774401857.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x22) [0x7f91471def82]
[bt] (1) /tmp/libxgboost4j2929418756774401857.so(xgboost::obj::RegLossObj<xgboost::obj::LogisticClassification>::GetGradient(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, int, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*)+0xec) [0x7f91472e8b6c]
[bt] (2) /tmp/libxgboost4j2929418756774401857.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x43b) [0x7f9147273dfb]
[bt] (3) /tmp/libxgboost4j2929418756774401857.so(XGBoosterUpdateOneIter+0x35) [0x7f91471e5845]
[bt] (4) [0x7f91a9018407]
ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
ml.dmlc.xgboost4j.java.Booster.update(Booster.java:181)
ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:190)
ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:64)
ml.dmlc.xgboost4j.scala.flink.XGBoost$MapFunction.mapPartition(XGBoost.scala:60)
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:99)
org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:503)
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
我在每个测试中都使用了以下超参数:
Subsample=1
Eta=0.05
Colsampletree=0.6
Silent=0
Objective=binary:logistic
Evalmetric=auc
Seed=42
Maxdepth=5
Minchildweight=6
Alpha=1.5
Booster=gbtree
Basescore=0.05
Nthread=8
Scaleposweight=0.5
Maxdeltastep=5
Gamma=0.1
Rounds=40
实现的mappartition是否可能执行不需要的分发,从而导致节点/工作进程处理“一类数据集”?
此外,我还遇到了一些参数,如num\u workers和timeout\u request\u workers,但它不在xgboost参数文档中,并且特定于xgboost4j spark。运行多节点flinkxgboost时,是否需要使用一些未记录的参数?
希望有人能帮我!谢谢。
暂无答案!
目前还没有任何答案,快来回答吧!