我正在使用pyspark 2.4,并且已经启用了hivesupport:
spark = SparkSession.builder.appName("spark").enableHiveSupport().getOrCreate()
但当我跑步时:
spark.sql("""
CREATE TABLE reporting.sport_ads AS
SELECT
*
, 'Home' as HomeOrAway
, HomeTeam as TeamName
FROM adwords_ads_brand
UNION
SELECT
*
, 'Away' as HomeOrAway
, AwayTeam as TeamName
FROM adwords_ads_brand
""")
我犯了错误:
pyspark.sql.utils.AnalysisException: "Hive support is required to CREATE Hive TABLE (AS SELECT);;\n'CreateTable `reporting`.`sport_ads`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists\n+- Distinct\n +- Union\n :-
....
这对我来说毫无意义,我做错什么了吗?
ps:我必须补充一点,这段代码在databricks和scala的spark中运行得非常好。
1条答案
按热度按时间ffx8fchx1#
请检查您的配置文件中的以下配置值
pyspark
```--conf spark.sql.catalogImplementation=hive
val df = spark.sql("""SELECT
, 'Home' as HomeOrAway
, HomeTeam as TeamName
FROM adwords_ads_brand
UNION
SELECT
, 'Away' as HomeOrAway
, AwayTeam as TeamName
FROM adwords_ads_brand""")
df.format("<parquet,orc..etc>").saveAsTable("<table_name>")