将数据插入dynamodb时,pyspark:picklingerror:无法序列化object:

70gysomp  于 2021-07-12  发布在  Spark
关注(0)|答案(0)|浏览(275)

我正在尝试将200万个项目插入dynamodb(wcu=40000)。但当我使用Spark贴图时,它是抛出错误。

  1. %livy.pyspark
  2. import shutil
  3. from typing import Text, List
  4. from pyspark.sql import SparkSession, DataFrame
  5. import boto3
  6. from urllib.parse import urlparse
  7. from boto3.dynamodb.conditions import Key
  8. dynamodb = boto3.resource('dynamodb', region_name="us-east-1")
  9. table = dynamodb.Table("<dynamboDB>")
  10. df=spark.read.parquet("s3 path").limit(10)
  11. df.rdd.map(lambda row: table.put_item(Item=row.asDict()))

错误

  1. Traceback (most recent call last):
  2. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 205, in __repr__
  3. return self._jrdd.toString()
  4. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2532, in _jrdd
  5. self._jrdd_deserializer, profiler)
  6. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2434, in _wrap_function
  7. pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  8. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2420, in _prepare_for_python_RDD
  9. pickled_command = ser.dumps(command)
  10. File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 607, in dumps
  11. raise pickle.PicklingError(msg)
  12. _pickle.PicklingError: Could not serialize object: TypeError: can't pickle SSLContext objects```

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题