pyspark 使用要素存储API将模型记录到MLflow,正在获取TypeError:join()参数必须为string、bytes或os.PathLike对象,而不是“dict”

wkftcu5l  于 2023-03-17  发布在  Spark
关注(0)|答案(1)|浏览(200)

我正在使用数据库。尝试使用Feature Store log_model函数将模型记录到MLflow:

fs.log_model(
                model,
                artifact_path="fs_model",
                flavor=mlflow.sklearn,
                training_set=fs_training_set,

)

脚本在运行10.4 LTS ML(包括Apache Spark 3.2.1、Scala 2.12)的作业集群上的工作流中运行。
以下是日志:

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
     62 if __name__ == "__main__":
     63     job = ModelTrainJob()
---> 64     job.launch()

/tmp/tmp51ge7k75.py in launch(self)
     56             env_vars=self.env_vars,
     57         )
---> 58         ModelTrain(cfg).run()
     59         _logger.info("ModelTrainJob job finished!")
     60 

/databricks/python/lib/python3.8/site-packages/customer_churn/objects/model_train.py in run(self)
    215             # Log model using Feature Store API
    216             _logger.info("Logging model to MLflow using Feature Store API")
--> 217             fs.log_model(
    218                 model,
    219                 artifact_path="fs_model",

/databricks/.python_edge_libs/databricks/feature_store/client.py in log_model(self, model, artifact_path, flavor, training_set, registered_model_name, await_registration_for, **kwargs)
   2106             # the databricks-feature-store package is not available via conda or pip.
   2107             conda_file = raw_mlflow_model.flavors["python_function"][mlflow.pyfunc.ENV]
-> 2108             conda_env = read_yaml(raw_model_path, conda_file)
   2109 
   2110             # Get the pip package string for the databricks-feature-lookup client

/databricks/python/lib/python3.8/site-packages/mlflow/utils/file_utils.py in read_yaml(root, file_name)
    210         )
    211 
--> 212     file_path = os.path.join(root, file_name)
    213     if not exists(file_path):
    214         raise MissingConfigException("Yaml file '%s' does not exist." % file_path)

/usr/lib/python3.8/posixpath.py in join(a, *p)
     88                 path += sep + b
     89     except (TypeError, AttributeError, BytesWarning):
---> 90         genericpath._check_arg_types('join', a, *p)
     91         raise
     92     return path

/usr/lib/python3.8/genericpath.py in _check_arg_types(funcname, *args)
    150             hasbytes = True
    151         else:
--> 152             raise TypeError(f'{funcname}() argument must be str, bytes, or '
    153                             f'os.PathLike object, not {s.__class__.__name__!r}') from None
    154     if hasstr and hasbytes:

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'dict'

我找到这个帖子
这表明存在问题

file_path = os.path.join(root, file_name)

但这是MLflow代码中的几层。

w8ntj3qf

w8ntj3qf1#

我想出了我的问题的答案,所以我要张贴,以防别人有同样的问题。
错误的原因是因为我使用的是Databricks Runtime 10.4 LTS ML。
当我升级到12.1 LTS ML时出错。

相关问题