在scala中计算行百分比

cdmah0mi  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(497)

对斯卡拉来说还是个新鲜事物。我试图计算scala中跨行的百分比。考虑以下几点 df :

val df = Seq(("word1", 25, 75),("word2", 15, 15),("word3", 10, 30)).toDF("word", "author1", "author2")

df.show

+-----+-------+-------+
| word|author1|author2|
+-----+-------+-------+
|word1|     25|     75|
|word2|     15|     15|
|word3|     10|     30|
+-----+-------+-------+

我知道我可以使用如下代码并获得预期的输出,但是我想知道是否有更好的方法:

val df_2 = df
  .withColumn("total", $"author1" + $"author2")
  .withColumn("author1 pct", $"author1"/$"total")
  .withColumn("author2 pct", $"author2"/$"total")
  .select("word", "author1 pct", "author2 pct")

df_2.show

+-----+-----------+-----------+
| word|author1 pct|author2 pct|
+-----+-----------+-----------+
|word1|       0.25|       0.75|
|word2|        0.5|        0.5|
|word3|       0.25|       0.75|
+-----+-----------+-----------+

加分给它的百分比格式“%”,没有小数。谢谢您!

p1iqtdky

p1iqtdky1#

也许你可以直接计算和选择百分比,而不是使用 .withColumn ,并使用 concat 添加 % 结尾处签名:

val df2 = df.select(
    $"word", 
    concat(($"author1"*100/($"author1" + $"author2")).cast("int"), lit("%")).as("author1 pct"), 
    concat(($"author2"*100/($"author1" + $"author2")).cast("int"), lit("%")).as("author2 pct")
)

df2.show
+-----+-----------+-----------+
| word|author1 pct|author2 pct|
+-----+-----------+-----------+
|word1|        25%|        75%|
|word2|        50%|        50%|
|word3|        25%|        75%|
+-----+-----------+-----------+

如果您想保留数字数据类型,那么您可以这样做

val df2 = df.select(
    $"word", 
    ($"author1"*100/($"author1" + $"author2")).cast("int").as("author1 pct"), 
    ($"author2"*100/($"author1" + $"author2")).cast("int").as("author2 pct")
)

相关问题