基于scala的Spark平均计算误差

vmjh9lq9  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(406)

我想计算利润的平均值。
Spark2.4.4
Dataframe外观like:-

+----------------+----------------+-------------------+
|     Customer   |CustomerCount   |profit|
+----------------+----------------+-------------------+
|Customer_162    |               8| 0.28|
|Customer_2634   |               1|0.31|
|Customer_1482   |               8|0.28 |

+----------------+----------------+-------------------+

    Code:
   newdf.select("Customer","CustomerCount","profit")
      .agg(sum("profit")
        .alias("sum"),
        count("CustomerCount").alias("count"))
      .withColumn("Mean", round(col("sum") /  sum("count").over(),2))
      .show()

Current Output shows like
        +----------------+-----+----+
        |             sum|count|Mean|
        +----------------+-----+----+

但我想得到这样的输出

+----------------+----------------+--------------+
|     Customer   |CustomerCount   |profit| Mean
+----------------+----------------+---------------+
|Customer_162    |               8| 0.28 |0.29
|Customer_2634   |               1|0.31  |0.29
|Customer_1482   |               8|0.28  |0.29
+----------------+----------------+--------+

谨致问候

o75abkj4

o75abkj41#

下面的代码可能会有所帮助。

val df1=df.select(round(mean($"profit"),2).alias("mean"))

df.join(df1).show()

/*
+-------------+-------------+------+----+
|     Customer|CustomerCount|profit|mean|
+-------------+-------------+------+----+
| Customer_162|            8|  0.28|0.29|
|Customer_2634|            1|  0.31|0.29|
|Customer_1482|            8|  0.28|0.29|
+-------------+-------------+------+----+

* /

相关问题