如何插入覆盖配置单元表而不失败org.apache.spark.sql.analysisexception:只能将数据写入具有单个路径的关系?

um6iljoc  于 2021-05-31  发布在  Hadoop
关注(0)|答案(1)|浏览(412)

我有一个配置单元表,我想用insert overwrite覆盖它,下面的查询示例

spark.sql("INSERT OVERWRITE TABLE my_database.my_table VALUES (221221, 'DUMMY_Record_Pav', 21233, 'SPACE')")
--Show create Table
CREATE TABLE `my_database.my_table`(    
  `player_id` string,   
  `player_type` string,     
  `position_id` string,     
  `position_location` string)   
ROW FORMAT SERDE    
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'     
WITH SERDEPROPERTIES (  
  'path'='hdfs://path/hive/data/my_database.db/my_table')   
STORED AS INPUTFORMAT   
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'   
OUTPUTFORMAT    
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'  
LOCATION    
  'hdfs://path/hive/data/my_database.db/my_table''  
TBLPROPERTIES ( 
 --Redacted
 )

spark sql查询失败,错误如下

org.apache.spark.sql.AnalysisException: Can only write data to relations with a single path.;
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:188)
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis$$anonfun$apply$1.applyOrElse(DataSourceStrategy.scala:134)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:107)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:106)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsDown(AnalysisHelper.scala:106)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperators(AnalysisHelper.scala:73)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:134)
  at org.apache.spark.sql.execution.datasources.DataSourceAnalysis.apply(DataSourceStrategy.scala:52)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
  at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:127)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:121)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
  ... 49 elided

我可以覆盖到一个类似的表,该表没有如下serdeProperty:

WITH SERDEPROPERTIES (  
  'path'='hdfs://path/hive/data/my_database.db/my_table')

有没有方法可以删除表的serdeProperty?
我尝试如下所示将路径设置为“”,但spark sql失败,出现空路径错误。 ALTER TABLE my_database.my_table SET SERDEPROPERTIES('path'=''); 删除serdeProperties将有助于运行spark sql查询。

pu3pd22g

pu3pd22g1#

您可以创建中间表

Create table mytable2 () with ();

insert into mytable2 select * from mytable;

alter table mytable rename to mytable1;
alter table mytable2 rename to mytable;

相关问题