配置单元hql以触发sql转换

ckocjqey  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(199)

我需要将hql转换为sparksql。我使用下面的方法,这样我就看不到性能有什么变化。如果有人有更好的建议,请告诉我。
Hive-

create table temp1 as select * from Table1 T1 join (select id , min(activity_date) as dt from Table1 group by id) T2 on T1.id=T2.id and T1.activity_date=T2.dt ;
create table temp2 as select * from temp1 join diff_table

我有大约70个这样的内部hive temp表,源表1中的数据大约是18亿,没有分区和200个hdfs文件。
spark代码-运行20个executor,5个executor内核,10g executor内存,yarn客户端,驱动程序4g

import org.apache.spark.sql.{Row,SaveMode,SparkSession}
val spark=SparkSession.builder().appName("test").config("spark.sql.warehouse.dir","/usr/hive/warehouse").enableHiveSupport().getOrCreate()
import spark.implicit._
import spark.sql

val id_df=sql("select id , min(activity_date) as dt from Table1 group by id")

val all_id_df=sql("select * from Table1")

id_df.createOrReplaceTempView("min_id_table")

all_id_df.createOrReplaceTempView("all_id_table")

val temp1_df=sql("select * from all_id_table T1 join min_id_table T2 on T1.id=T2.id and T1.activity_date=T2.dt")

temp1_df.createOrReplaceTempView("temp2")

sql("create or replace table temp as select * from temp2")

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题