hive 无法使用Spark Shell中的现有配置单元表创建外部配置单元表

oalqel3c  于 2023-08-04  发布在  Hive
关注(0)|答案(2)|浏览(198)

我正在尝试从spark shell使用现有的hive表创建一个外部表。(这在beeline/Hive shell中工作正常,但在Spark Shell中无法做到)
第一个月

== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_db.new_table LIKE old_db.old_table
----------------------------------------------------------------------------------^^^

  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
  ... 49 elided

字符串
甚至尝试了Hive Context,如下所示
import org.apache.spark.sql.hive.HiveContext;
val sqlContext = new HiveContext(sc)
sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS new_db.new_table LIKE old_db.old_table")

== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_db.new_table LIKE old_db.old_table
----------------------------------------------------------------------------------^^^

  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
  at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
  ... 49 elided


如果我尝试创建常规内部表,然后更改属性使其外部,如下所示
spark.sql("ALTER TABLE new_db.new_table SET TBLPROPERTIES('EXTERNAL'='TRUE')")
那么我将面临以下错误

org.apache.spark.sql.AnalysisException: Cannot set or change the preserved property key: 'EXTERNAL';
  at org.apache.spark.sql.hive.HiveExternalCatalog.verifyTableProperties(HiveExternalCatalog.scala:136)
  at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:567)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  at org.apache.spark.sql.hive.HiveExternalCatalog.alterTable(HiveExternalCatalog.scala:563)
  at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTable(ExternalCatalogWithListener.scala:118)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTable(SessionCatalog.scala:358)
  at org.apache.spark.sql.execution.command.AlterTableSetPropertiesCommand.run(ddl.scala:251)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
  at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
  ... 49 elided


感谢任何将现有配置单元表克隆为Spark Shell中的外部表的建议

pqwbnv8z

pqwbnv8z1#

spark.sql("ALTER TABLE new_db.new_table SET TBLPROPERTIES('EXTERNAL'='TRUE')")

字符串
只有new_db.new_table是非事务性内部/托管表时才有效。可以将数据文件从配置单元数据位置复制到计划存储外部表的位置,删除该表并将其重新创建为外部表。

pkmbmrz7

pkmbmrz72#

是的,spark不支持与external一起使用的create+like语法。

现在我看到两个解决方案:
1.从JDBC/Python配置单元驱动程序运行查询
1.使用下面的Spark代码

val df = spark.read.table("old_db.old_table").limit(0)
df.write.format("whatever").option("path"' "thenewpath").saveAsTable("new_db.new_table")

字符串
这将在配置单元元存储中创建表ddl。如果需要,您还可以对表进行分区。

相关问题