我正在尝试从spark shell使用现有的hive表创建一个外部表。(这在beeline/Hive shell中工作正常,但在Spark Shell中无法做到)
第一个月
== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_db.new_table LIKE old_db.old_table
----------------------------------------------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
... 49 elided
字符串
甚至尝试了Hive Context,如下所示import org.apache.spark.sql.hive.HiveContext;
个val sqlContext = new HiveContext(sc)
个sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS new_db.new_table LIKE old_db.old_table")
个
== SQL ==
CREATE EXTERNAL TABLE IF NOT EXISTS new_db.new_table LIKE old_db.old_table
----------------------------------------------------------------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
... 49 elided
型
如果我尝试创建常规内部表,然后更改属性使其外部,如下所示spark.sql("ALTER TABLE new_db.new_table SET TBLPROPERTIES('EXTERNAL'='TRUE')")
个
那么我将面临以下错误
org.apache.spark.sql.AnalysisException: Cannot set or change the preserved property key: 'EXTERNAL';
at org.apache.spark.sql.hive.HiveExternalCatalog.verifyTableProperties(HiveExternalCatalog.scala:136)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTable$1(HiveExternalCatalog.scala:567)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at org.apache.spark.sql.hive.HiveExternalCatalog.alterTable(HiveExternalCatalog.scala:563)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTable(ExternalCatalogWithListener.scala:118)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTable(SessionCatalog.scala:358)
at org.apache.spark.sql.execution.command.AlterTableSetPropertiesCommand.run(ddl.scala:251)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3369)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
... 49 elided
型
感谢任何将现有配置单元表克隆为Spark Shell中的外部表的建议
2条答案
按热度按时间pqwbnv8z1#
字符串
只有new_db.new_table是非事务性内部/托管表时才有效。可以将数据文件从配置单元数据位置复制到计划存储外部表的位置,删除该表并将其重新创建为外部表。
pkmbmrz72#
是的,spark不支持与external一起使用的create+like语法。
现在我看到两个解决方案:
1.从JDBC/Python配置单元驱动程序运行查询
1.使用下面的Spark代码
字符串
这将在配置单元元存储中创建表ddl。如果需要,您还可以对表进行分区。