如何使用hashmap更新/转换/替换spark df列值

ogq8wdun  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(430)

我想用hashmap替换给定df列的值,但是我在语法上遇到了困难。有没有人能给我指出正确的方向或是一个已有的例子?我找了些东西,但找不到能说明确切问题的东西。
编辑:
设想如下所示的Dataframe:

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Simpson|BeerDrinker|
|      Marge| Simpson|  Housewife|
|       Bart| Simpson|        Son|
|       Lisa| Simpson|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+

我有一个键值对的Map,如下所示:

type ValueMap = scala.collection.mutable.HashMap [String,String]
  var mymap = new ValueMap ()
  mymap += ("Simpson" -> "Surname")

我想做一个操作(目前我还不清楚),然后得到如下所示的结果。所以基本上在专栏里 Pronoun ,所有等于 Simpson 已替换为Map中相应的值 mymap 哪个是 Surname ```
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Surname|BeerDrinker|
| Marge| Surname| Housewife|
| Bart| Surname| Son|
| Lisa| Surname| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+

omtl5h9j

omtl5h9j1#

用udf试试这个方法,

val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")

df.show(false)

-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Simpson |BeerDrinker|
|Marge      |Simpson |Housewife  |
|Bart       |Simpson |Son        |
|Lisa       |Simpson |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))

resDF.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

如果这有帮助,请告诉我。
更新时间:
如果没有自定义项,
将Map作为另一列添加到df

val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

另一个简单的方法不是添加新列,

val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)

相关问题