sparksql:在select子句中使用别名列时出错

hm2xizp9  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(543)

我有 ORDERS 以及 SELLERS 具有以下架构的表:

scala> spark.sql("desc ORDERS").show()
+---------------+---------+-------+
|       col_name|data_type|comment|
+---------------+---------+-------+
|       order_id|   string|   null|
|     product_id|   string|   null|
|      seller_id|   string|   null|
|           date|   string|   null|
|num_pieces_sold|   string|   null|
|  bill_raw_text|   string|   null|
+---------------+---------+-------+

scala> spark.sql("desc SELLERS").show()
+------------+---------+-------+
|    col_name|data_type|comment|
+------------+---------+-------+
|   seller_id|   string|   null|
| seller_name|   string|   null|
|daily_target|   string|   null|
+------------+---------+-------+

以下查询出现错误。我知道我们可以使用子查询(或 withColumn )为了避免这种情况,我想知道是否有其他的方法。

scala> 
q3: String =
"
 SELECT
    o.seller_id,
    o.num_pieces_sold/s.daily_target AS ratio,
    avg(ratio) AS percent_contribution
 FROM orders o
 JOIN sellers s ON o.seller_id = s.seller_id
 GROUP BY o.seller_id
 "

scala> spark.sql(q3).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`ratio`' given input columns: [o.bill_raw_text, s.seller_id, o.date, o.seller_id, o.order_id, o.product_id, s.seller_name, o.num_pieces_sold, s.daily_target]; line 5 pos 6;
'Aggregate [seller_id#8], [seller_id#8, (cast(num_pieces_sold#10 as double) / cast(daily_target#20 as double)) AS ratio#421, 'avg('ratio) AS percent_contribution#422]
+- Join Inner, (seller_id#8 = seller_id#18)
   :- SubqueryAlias `o`
   :  +- SubqueryAlias `orders`
   :     +- Relation[order_id#6,product_id#7,seller_id#8,date#9,num_pieces_sold#10,bill_raw_text#11] parquet
   +- SubqueryAlias `s`
      +- SubqueryAlias `sellers`
         +- Relation[seller_id#18,seller_name#19,daily_target#20] parquet
643ylb08

643ylb081#

似乎catalyst没有(通过设计)识别中间表达式 ratio 它又一次被用在了另一个列中,比如normalsql
直接使用 avg 函数的工作方式如下,没有对比率进行混叠

package examples

import org.apache.log4j.Level
import org.apache.spark.sql.{SaveMode, SparkSession}

object Ratio extends App {
  val logger = org.apache.log4j.Logger.getLogger("org")
  logger.setLevel(Level.WARN)

  val spark: SparkSession = SparkSession.builder.appName(getClass.getName).enableHiveSupport()
    .master("local[*]").getOrCreate

  import spark._
  import spark.implicits._

  sql(
    """
      |CREATE TABLE IF NOT EXISTS orders
      |(
      |  order_id string
      |, product_id String
      |, seller_id String
      |, date int
      |, num_pieces_sold String
      |, bill_raw_text String
      |)
    """.stripMargin)

  sql(
    """
      |CREATE TABLE IF NOT EXISTS sellers
      |(
      |  seller_id string
      |, seller_name String
      |, daily_target String )
    """.stripMargin)

  sql("SHOW tables ").show

  val df1 = Seq(
    ("1", "1", "1", 20200520, 10, "rawtext"),
    ("2", "2", "2", 20200521, 11, "rawtext1")
  ).toDF("order_id", "product_id", "seller_id", "date", "num_pieces_sold", "bill_raw_text")

  df1.write.mode(SaveMode.Overwrite).saveAsTable("orders")
  df1.show

  val df2 = Seq(
    ("1", "seller_name1", "11"),
    ("2", "seller_name2", "22")
  ).toDF("seller_id", "seller_name", "daily_target")

  df2.write.mode(SaveMode.Overwrite).saveAsTable("sellers")
  df2.show

  val q3 =
    """
      | SELECT
      |    o.seller_id,
      |    avg(o.num_pieces_sold/ s.daily_target) AS percent_contribution
      | FROM orders o
      | JOIN sellers s ON o.seller_id = s.seller_id
      |  GROUP BY o.seller_id
    """.stripMargin

  spark.sql(q3).show

}

结果:

+--------+-------------+-----------+
|database|    tableName|isTemporary|
+--------+-------------+-----------+
 default|       orders|      false|
| default|      sellers|      false|
+--------+-------------+-----------+

20/05/18 21:08:29 INFO log: Updating table stats fast for orders
20/05/18 21:08:29 INFO log: Updated size of table orders to 2949
+--------+----------+---------+--------+---------------+-------------+
|order_id|product_id|seller_id|    date|num_pieces_sold|bill_raw_text|
+--------+----------+---------+--------+---------------+-------------+
|       1|         1|        1|20200520|             10|      rawtext|
|       2|         2|        2|20200521|             11|     rawtext1|
+--------+----------+---------+--------+---------------+-------------+

+---------+------------+------------+
|seller_id| seller_name|daily_target|
+---------+------------+------------+
|        1|seller_name1|          11|
|        2|seller_name2|          22|
+---------+------------+------------+

+---------+--------------------+
|seller_id|percent_contribution|
+---------+--------------------+
|        1|  0.9090909090909091|
|        2|                 0.5|
+---------+--------------------+

相关问题