我有 ORDERS
以及 SELLERS
具有以下架构的表:
scala> spark.sql("desc ORDERS").show()
+---------------+---------+-------+
| col_name|data_type|comment|
+---------------+---------+-------+
| order_id| string| null|
| product_id| string| null|
| seller_id| string| null|
| date| string| null|
|num_pieces_sold| string| null|
| bill_raw_text| string| null|
+---------------+---------+-------+
scala> spark.sql("desc SELLERS").show()
+------------+---------+-------+
| col_name|data_type|comment|
+------------+---------+-------+
| seller_id| string| null|
| seller_name| string| null|
|daily_target| string| null|
+------------+---------+-------+
以下查询出现错误。我知道我们可以使用子查询(或 withColumn
)为了避免这种情况,我想知道是否有其他的方法。
scala>
q3: String =
"
SELECT
o.seller_id,
o.num_pieces_sold/s.daily_target AS ratio,
avg(ratio) AS percent_contribution
FROM orders o
JOIN sellers s ON o.seller_id = s.seller_id
GROUP BY o.seller_id
"
scala> spark.sql(q3).show()
org.apache.spark.sql.AnalysisException: cannot resolve '`ratio`' given input columns: [o.bill_raw_text, s.seller_id, o.date, o.seller_id, o.order_id, o.product_id, s.seller_name, o.num_pieces_sold, s.daily_target]; line 5 pos 6;
'Aggregate [seller_id#8], [seller_id#8, (cast(num_pieces_sold#10 as double) / cast(daily_target#20 as double)) AS ratio#421, 'avg('ratio) AS percent_contribution#422]
+- Join Inner, (seller_id#8 = seller_id#18)
:- SubqueryAlias `o`
: +- SubqueryAlias `orders`
: +- Relation[order_id#6,product_id#7,seller_id#8,date#9,num_pieces_sold#10,bill_raw_text#11] parquet
+- SubqueryAlias `s`
+- SubqueryAlias `sellers`
+- Relation[seller_id#18,seller_name#19,daily_target#20] parquet
1条答案
按热度按时间643ylb081#
似乎catalyst没有(通过设计)识别中间表达式
ratio
它又一次被用在了另一个列中,比如normalsql直接使用
avg
函数的工作方式如下,没有对比率进行混叠结果: