特定列和特定行的fillna值

ckx4rj1h  于 2021-07-13  发布在  Spark
关注(0)|答案(1)|浏览(285)

我有以下PyparkDataframe:

import numpy as np
from pyspark.sql.types import *

schema = StructType([
    StructField('user', StringType(), True),
    StructField('created', IntegerType(), True),
    StructField('month_1', FloatType(), True),
    StructField('month_2', FloatType(), True),
    StructField('month_3', FloatType(), True),
    StructField('month_4', FloatType(), True),
  ])

data = [['tom', 2, np.nan,1.0,1.0,1.0], 
        ['nick', 1,1.0, np.nan, np.nan, np.nan], 
        ['jack', 3,np.nan,np.nan,1.0,1.0],
        ['jason', 2,np.nan,1.0,1.0,np.nan]]

df=spark.createDataFrame(data,schema)

df.show()

+-----+-------+-------+-------+-------+-------+
| user|created|month_1|month_2|month_3|month_4|
+-----+-------+-------+-------+-------+-------+
|  tom|      2|    NaN|    1.0|    1.0|    1.0|
| nick|      1|    1.0|    NaN|    NaN|    NaN|
| jack|      3|    NaN|    NaN|    1.0|    1.0|
|jason|      2|    NaN|    1.0|    1.0|    NaN|
+-----+-------+-------+-------+-------+-------+

我想根据所创建列的值来填充。
如果“月”列大于等于创建的值,则为1.0
如果“月”列小于创建的值,则为0.0
所需输出应为:

+-----+-------+-------+-------+-------+-------+
| user|created|month_1|month_2|month_3|month_4|
+-----+-------+-------+-------+-------+-------+
|  tom|      2|    0.0|    1.0|    1.0|    1.0|
| nick|      1|    1.0|    1.0|    1.0|    1.0|
| jack|      3|    0.0|    0.0|    1.0|    1.0|
|jason|      2|    0.0|    1.0|    1.0|    1.0|
+-----+-------+-------+-------+-------+-------+
rekjcdws

rekjcdws1#

你可以用 nanvl 替换 NaN 使用条件值创建 when :

import pyspark.sql.functions as F

df2 = df.select(
    'user', 'created',
    *[
        F.nanvl(
            F.col(f'month_{c}'),
            F.when(F.col('created') <= c, 1).otherwise(0)
        ).alias(f'month_{c}')
        for c in range(1,5)
    ]
)

df2.show()
+-----+-------+-------+-------+-------+-------+
| user|created|month_1|month_2|month_3|month_4|
+-----+-------+-------+-------+-------+-------+
|  tom|      2|    0.0|    1.0|    1.0|    1.0|
| nick|      1|    1.0|    1.0|    1.0|    1.0|
| jack|      3|    0.0|    0.0|    1.0|    1.0|
|jason|      2|    0.0|    1.0|    1.0|    1.0|
+-----+-------+-------+-------+-------+-------+

相关问题