使用管道从s3加载pyspark.ml模型

elcex8rz  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(563)

我正在尝试将一个经过训练的模型保存到s3存储中,然后尝试通过pyspark.ml中的pipeline包加载和预测这个模型。下面是我如何保存模型的示例。

  1. # stage_1 to stage_4 are some basic trasnformation on data one-hot encoding e.t.c
  2. # define stage 5: logistic regression model
  3. stage_5 = LogisticRegression(featuresCol='features',labelCol='label')
  4. # SETUP THE PIPELINE
  5. regression_pipeline = Pipeline(stages= [stage_1, stage_2, stage_3, stage_4, stage_5])
  6. # fit the pipeline for the trainind data
  7. model = regression_pipeline.fit(dataFrame1)
  8. model_path ="s3://s3-dummy_path-orch/dummy models/pipeline_testing_1.model"
  9. model.save(model_path)

我能够成功地保存模型&在上面提到的模型路径上创建了两个文件夹
阶段
元数据。
然而,当我试图加载模型,它给了我以下的错误。

  1. Traceback (most recent call last):
  2. File "/tmp/pythonScript_85ff2462_e087_4805_9f50_0c75fc4302e2958379757178872310.py", line 75, in <module>
  3. pipelineModel = Pipeline.load(model_path)
  4. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 362, in load
  5. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/pipeline.py", line 207, in load
  6. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/ml/util.py", line 300, in load
  7. File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  8. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
  9. pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.Pipeline but found class name org.apache.spark.ml.PipelineModel'

我正在尝试加载模型,如下所示:

  1. from pyspark.ml import Pipeline
  2. ## same path used while #model.save in the above code snippet
  3. model_path ="s3://s3-dummy_path-orch/dummy models/pipeline_testing_1.model"
  4. pipelineModel = Pipeline.load(model_path)

我怎样才能纠正这个问题呢?

ufj5ltwl

ufj5ltwl1#

如果保存了管道模型,则应将其作为管道模型加载,而不是作为管道加载。不同之处在于,管道模型适合于Dataframe,而管道模型不适合。

  1. from pyspark.ml import PipelineModel
  2. pipelineModel = PipelineModel.load(model_path)

相关问题