pyspark/hive:尝试执行drop partitions查询时,分区被标记为deleted的错误

pn9klfpd  于 2021-05-31  发布在  Hadoop
关注(0)|答案(0)|浏览(191)

我尝试使用pyspark脚本删除一些分区,在aws emr集群中并行运行多个作业。我正在尝试运行以下代码:

spark.sql("USE db_stage_dev")

drop_query = "ALTER TABLE bt_table DROP IF EXISTS PARTITION (dia='2019-11-18', hora='09')".format(table_stage, partition)

spark.sql(drop_query).show()

有时会出现以下问题:

Traceback (most recent call last):
  File "/home/hadoop/code/processing_data_default/spark_submit_drop_partition.py", line 96, in <module>
    main()
  File "/home/hadoop/code/processing_data_default/spark_submit_drop_partition.py", line 83, in main
    spark.sql(drop_query).show()
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u"org.apache.hadoop.hive.ql.metadata.HiveException: File 'bt_table/dia=2019-11-18' is marked as deleted in the metadata;"

数据正在存储中
当我在pyspark shell中手动运行它时,它总是工作的。但是,在集群上下文中运行时,它失败了。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题