在pyspark的其他列中填入条件

nkhmeac6  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(288)

数据:

col1       result 

 good       positive 
 bad        null
 excellent  null
 good       null        
 good       null

所需输出:

col1       result 

 good       positive 
 bad        positive
 excellent  null
 good       negative        
 good       negative

我有以下两个条件,想把它和 .fillna ,这些条件仅适用于 null 中的值 result

df = df.withColumn('result', when(col('col1') == 'good', 'negative').otherwise(df["result"]))
df = df.withColumn('result', when(col('col1') == 'bad', 'positive').otherwise(df["result"]))
qij5mzcb

qij5mzcb1#

你可以用 coalesce 要根据需要替换空值:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'Result', 
    F.coalesce(
        F.col('Result'),
        F.when(F.col('col1') == 'good', 'negative')
         .when(F.col('col1') == 'bad', 'positive')
    )
)

df2.show()
+---------+--------+
|     Col1|  Result|
+---------+--------+
|     good|positive|
|      bad|positive|
|excellent|    null|
|     good|negative|
|     good|negative|
+---------+--------+

相关问题