pyspark 2.4无法从sql命令创建表创建配置单元表需要配置单元支持

neekobn8  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(608)

我正在使用pyspark 2.4,并且已经启用了hivesupport:

spark = SparkSession.builder.appName("spark").enableHiveSupport().getOrCreate()

但当我跑步时:

spark.sql("""
CREATE TABLE reporting.sport_ads AS

SELECT 

* 

, 'Home' as HomeOrAway
, HomeTeam as TeamName
FROM adwords_ads_brand
UNION
SELECT 

* 

, 'Away' as HomeOrAway
, AwayTeam as TeamName
FROM adwords_ads_brand
""")

我犯了错误:

pyspark.sql.utils.AnalysisException: "Hive support is required to CREATE Hive TABLE (AS SELECT);;\n'CreateTable `reporting`.`sport_ads`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists\n+- Distinct\n   +- Union\n      :-
....

这对我来说毫无意义,我做错什么了吗?
ps:我必须补充一点,这段代码在databricks和scala的spark中运行得非常好。

ffx8fchx

ffx8fchx1#

请检查您的配置文件中的以下配置值 pyspark ```

spark.sparkContext.getConf().get("spark.sql.catalogImplementation")

如果属性值未设置为 `hive` .
尝试在pyspark shell中传递下面的conf

--conf spark.sql.catalogImplementation=hive

再次运行代码。 `UPDATE:` 创建 `dataframe` 非联合查询:

val df = spark.sql("""SELECT

, 'Home' as HomeOrAway
, HomeTeam as TeamName
FROM adwords_ads_brand
UNION
SELECT

, 'Away' as HomeOrAway
, AwayTeam as TeamName
FROM adwords_ads_brand""")

然后使用 `.saveAsTable` 功能

df.format("<parquet,orc..etc>").saveAsTable("<table_name>")

相关问题