我想根据以下条件在我的Dataframe中创建一个新列。
我的Dataframe是这样的:
my_string
2020 test
2020 prod
2020 dev
我的状况:
value1=subtract string after space from my_string
value2=subtract first four digit from my_string
If value 1 contains string 'test' then new_col=value2+"01"
If value 1 contains string 'prod' then new_col=value2+"kk"
If value 1 contains string 'dev' then new_col=value2+"ff"
我需要这样的结果:
my_string | new_col
2020 test | 202001
2020 prod | 2020kk
2020 dev | 2020ff
有人能帮帮我吗?
1条答案
按热度按时间7z5jn7bk1#
使用
row_number
窗口函数monotonically_increasing_id()
```from pyspark.sql import *
from pyspark.sql.functions import *
w = Window.orderBy(monotonically_increasing_id())
df.withColumn("new_col",concat(split(col("my_string")," ")[0], lpad(row_number().over(w),2,"0"))).show()
+---------+-------+
|my_string|new_col|
+---------+-------+
|2020 test| 202001|
|2020 prod| 202002|
| 2020 dev| 202003|
+---------+-------+
```
UPDATE:
使用when+otherwise
声明。