pysparkDataframesql

odopli94  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(308)

我想将以下语句转换为df select语句:

  1. Select
  2. YY,
  3. PP,
  4. YYYY,
  5. PPPP,
  6. Min(ID) as MinId,
  7. Max(ID) as MaxID
  8. from LoadTable

我尝试了以下方法,但似乎不起作用:

  1. df.select(df.ID,df.YY, df.PP,df.YYYY,df.PPPPP).agg({"ID": "max", "ID": "min"}).toPandas().to_csv(outputFile, sep="|", header=True, index=False)
j0pj023g

j0pj023g1#

在执行聚合函数时,这里可能缺少 GROUP BY 声明。如果是这样,您的sql语句将是:

  1. SELECT YY, PP, YYYY, PPPP, Min(ID) as MinId, Max(ID) as MaxID
  2. FROM LoadTable
  3. GROUP BY YY, PP, YYYY, PPPP

相应的pyspark dataframe语句将是

  1. from pyspark.sql import functions as F
  2. df.groupBy(df.YY, df.PP, df.YYYY, df.PPPP).agg(F.min(df.ID), F.max(df.ID))

嗯!

相关问题