sql-max()值无效

hvvq6cgz 于 2021-08-01 发布在 Java

关注(0)|答案(2)|浏览(513)

图1：样本数据

图2：输出不正确

图3：所需输出

查询：我正在尝试按类名和客户（image1示例数据）从列（中位数百分比）中查询最大值
问题：查询显示的是所有客户，而不是具有最大中值的客户（image2结果不正确）。它正确地计算了max（），但是查询将所有客户的值放入class\u name中，而不是具有该max值的客户
我需要的是只有类名称有最大（中位数百分比）和显示客户(图像3（期望输出）

Select  
       distinct  
        C.Class_Name,
        C.Customer,
        C.Max_Median_Percentage

FROM (

    SELECT 

        B.Class_Name,

        case (when B.Median_Percentage =  Max(B.Median_Percentage) OVER(PARTITION By B.Class_Name ORDER BY B.Median_Percentage desc  )
            then B.Customer
        end as Customer,

        Max(B.Median_Percentage) OVER(PARTITION By B.Class_Name ORDER BY B.Median_Percentage desc  ) as Max_Median_Percentage

    FROM (

        SELECT 

            A.Class_Name,
            A.Customer,
            A.Date_Time

            A.Median_Percentage

        From table1 as A

    ) as B

) as C

sql apache-spark-sql window-functions Max

来源：https://stackoverflow.com/questions/62648809/sql-max-value-from-the-group-is-not-working

2条答案

按热度按时间

neskvpey1#

你可以用 percentile_cont() 如果您的数据库不直接支持“中值”函数：

select t.*,
       boot_time / percentile_cont(0.5) within group (order by boot_time) over (partition by classid)
from t;

如果你的数据库没有 percentile_cont() 或者 percentile_disc() 函数，使用一个简单的 ntile() :

select t.*,
       boot_time / max(case when tile = 1 then boot_time end) over (partition by classid)
from (select t.*,
             ntile(2) over (partition by classid order by boot_time) as tile
      from t
     ) t

如果表中的行数为奇数，则此方法完全有效 classid . 对于偶数，它是由1关闭。你可以很容易地处理，但更复杂的是：

select t.*,
       (boot_time /
        (( max(case when tile_asc = 1 then boot_time end) over (partition by classid) / 2 +
           max(case when tile_desc = 1 then boot_time end) over (partition by classid)
         ) / 2
        )
       )
from (select t.*,
             ntile(2) over (partition by classid order by boot_time) as tile_asc,
             ntile(2) over (partition by classid order by boot_time desc) as tile_desc
      from t
     ) t

赞(0）回复(0）举报 2021-08-01

s8vozzvw2#

也许这是有用的-

加载提供的测试数据

val df = spark.sql(
      """
        |select Class_Name, Customer, Date_Time, Median_Percentage
        |from values
        |   ('ClassA', 'A', '6/13/20', 64550),
        |   ('ClassA', 'B', '6/6/20', 40200),
        |   ('ClassB', 'F', '6/20/20', 26800),
        |   ('ClassB', 'G', '6/20/20', 18100)
        |  T(Class_Name, Customer, Date_Time, Median_Percentage)
      """.stripMargin)
    df.show(false)
    df.printSchema()

    /**
      * +----------+--------+---------+-----------------+
      * |Class_Name|Customer|Date_Time|Median_Percentage|
      * +----------+--------+---------+-----------------+
      * |ClassA    |A       |6/13/20  |64550            |
      * |ClassA    |B       |6/6/20   |40200            |
      * |ClassB    |F       |6/20/20  |26800            |
      * |ClassB    |G       |6/20/20  |18100            |
      * +----------+--------+---------+-----------------+
      *
      * root
      * |-- Class_Name: string (nullable = false)
      * |-- Customer: string (nullable = false)
      * |-- Date_Time: string (nullable = false)
      * |-- Median_Percentage: integer (nullable = false)
      */

按类名查找最大中位数百分比行

df.groupBy("Class_Name")
      .agg(max(struct($"Median_Percentage", $"Date_Time", $"Customer")).as("struct"))
      .selectExpr("Class_Name", "struct.Customer", "struct.Date_Time", "struct.Median_Percentage")
      .show(false)

    /**
      * +----------+--------+---------+-----------------+
      * |Class_Name|Customer|Date_Time|Median_Percentage|
      * +----------+--------+---------+-----------------+
      * |ClassA    |A       |6/13/20  |64550            |
      * |ClassB    |F       |6/20/20  |26800            |
      * +----------+--------+---------+-----------------+
      */

赞(0）回复(0）举报 2021-08-01

我来回答

sql-max()值无效

2条答案

加载提供的测试数据

按类名查找最大中位数百分比行

相关问题

热门标签

最新问答