spark dataframe:hivemax(case)语句的可选scala代码

de90aj5v  于 2021-06-28  发布在  Hive
关注(0)|答案(2)|浏览(479)

我有四列数据框(id int,name string,mobile string,phone string)
我需要另一种方法来实现配置单元查询到scala代码中的逻辑。
配置单元查询是:

  1. SELECT id AS member_id
  2. ,max(CASE WHEN name = 'Mrs.' THEN mobile ELSE NULL END) AS mobile
  3. ,max(CASE WHEN name = 'Dr.' THEN phone ELSE NULL END) AS phone
  4. from temp1
  5. group by id;

谢谢。

kcrjzv8t

kcrjzv8t1#

你可以写:

  1. dataFrame.registerTempTable("temp1")
  2. val result = sqlContext.sql (here put same SQL as in question)

或在spark 2.0中:

  1. dataset.createTempView("temp1")
  2. val result = sparkSession.sql(here put same SQL as in question)

或者,可以使用数据集api:

  1. val mobileUDF = udf {
  2. (name : String, mobile : String) => if (name == "Mrs.") mobile else null;
  3. }
  4. val phoneUDF = udf {
  5. (name : String, phone: String) => if (name == "Mrs.") phone else null;
  6. }
  7. dataset.withColumn("newMobile", mobileUDF($"name", $"mobile"))
  8. .withColumn("newPhone", phoneUDF($"name", $"phone"))
  9. .groupBy($"id")
  10. .agg(max(col("newMobile")), max(col("newPhone")))
展开查看全部
lc8prwob

lc8prwob2#

尝试:

  1. df.groupBy('id).agg(
  2. max(when('name === "Mrs.", 'mobile)).alias("mobile"),
  3. max(when('name === "Dr.", 'phone)).alias("phone")
  4. )

相关问题