特定列和特定行的fillna值

ckx4rj1h 于 2021-07-13 发布在 Spark

关注(0)|答案(1)|浏览(284)

我有以下PyparkDataframe：

import numpy as np
from pyspark.sql.types import *

schema = StructType([
    StructField('user', StringType(), True),
    StructField('created', IntegerType(), True),
    StructField('month_1', FloatType(), True),
    StructField('month_2', FloatType(), True),
    StructField('month_3', FloatType(), True),
    StructField('month_4', FloatType(), True),
  ])

data = [['tom', 2, np.nan,1.0,1.0,1.0], 
        ['nick', 1,1.0, np.nan, np.nan, np.nan], 
        ['jack', 3,np.nan,np.nan,1.0,1.0],
        ['jason', 2,np.nan,1.0,1.0,np.nan]]

df=spark.createDataFrame(data,schema)

df.show()

+-----+-------+-------+-------+-------+-------+
| user|created|month_1|month_2|month_3|month_4|
+-----+-------+-------+-------+-------+-------+
|  tom|      2|    NaN|    1.0|    1.0|    1.0|
| nick|      1|    1.0|    NaN|    NaN|    NaN|
| jack|      3|    NaN|    NaN|    1.0|    1.0|
|jason|      2|    NaN|    1.0|    1.0|    NaN|
+-----+-------+-------+-------+-------+-------+

我想根据所创建列的值来填充。
如果“月”列大于等于创建的值，则为1.0
如果“月”列小于创建的值，则为0.0
所需输出应为：

+-----+-------+-------+-------+-------+-------+
| user|created|month_1|month_2|month_3|month_4|
+-----+-------+-------+-------+-------+-------+
|  tom|      2|    0.0|    1.0|    1.0|    1.0|
| nick|      1|    1.0|    1.0|    1.0|    1.0|
| jack|      3|    0.0|    0.0|    1.0|    1.0|
|jason|      2|    0.0|    1.0|    1.0|    1.0|
+-----+-------+-------+-------+-------+-------+

apache-spark pyspark apache-spark-sql

来源：https://stackoverflow.com/questions/66247581/fillna-values-for-specific-columns-and-specific-rows

1条答案

按热度按时间

rekjcdws1#

你可以用 nanvl 替换 NaN 使用条件值创建 when :

import pyspark.sql.functions as F

df2 = df.select(
    'user', 'created',
    *[
        F.nanvl(
            F.col(f'month_{c}'),
            F.when(F.col('created') <= c, 1).otherwise(0)
        ).alias(f'month_{c}')
        for c in range(1,5)
    ]
)

df2.show()
+-----+-------+-------+-------+-------+-------+
| user|created|month_1|month_2|month_3|month_4|
+-----+-------+-------+-------+-------+-------+
|  tom|      2|    0.0|    1.0|    1.0|    1.0|
| nick|      1|    1.0|    1.0|    1.0|    1.0|
| jack|      3|    0.0|    0.0|    1.0|    1.0|
|jason|      2|    0.0|    1.0|    1.0|    1.0|
+-----+-------+-------+-------+-------+-------+

赞(0）回复(0）举报 2021-07-13

我来回答

特定列和特定行的fillna值

1条答案

相关问题

热门标签

最新问答