将数据插入dynamodb时,pyspark:picklingerror:无法序列化object:

70gysomp  于 2021-07-12  发布在  Spark
关注(0)|答案(0)|浏览(214)

我正在尝试将200万个项目插入dynamodb(wcu=40000)。但当我使用Spark贴图时,它是抛出错误。

%livy.pyspark
import shutil
from typing import Text, List
from pyspark.sql import SparkSession, DataFrame
import boto3
from urllib.parse import urlparse
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb', region_name="us-east-1")
table = dynamodb.Table("<dynamboDB>")
df=spark.read.parquet("s3 path").limit(10)

df.rdd.map(lambda row:  table.put_item(Item=row.asDict()))

错误

Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 205, in __repr__
    return self._jrdd.toString()
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2532, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2434, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2420, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 607, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can't pickle SSLContext objects```

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题