pyspark筛选列的值以分配新列

iaqfqrcu 于 2022-11-21 发布在 Spark

关注(0)|答案(2)|浏览(145)

在python中，你可以编写一个过滤器，并使用**df.loc[df[“A”].isin（[1，2，3]），“newColumn”] =“numberType”**为新列赋值。

pyspark

来源：https://stackoverflow.com/questions/74515285/pyspark-filter-the-value-of-a-column-to-assign-a-new-column

2条答案

按热度按时间

r1zhe5dt1#

仅供参考，在Python中没有DataFrame这样的东西。上面显示的代码是Pandas语法-一个为数据分析和操作编写的Python库。
对于您的问题，可以使用pyspark.sql.函数中的when、lit和col来实现这一点。

from pyspark.sql.functions import when, lit, col

df1 = df.withColumn("newColumn", 
    when(col("A").isin([1, 2, 3], 
        lit("numberType")).otherwise(lit("notNumberType")))

df1.show(truncate=False)

赞(0）回复(0）举报 2022-11-21

7nbnzgx92#

使用when函数过滤行，使用isin函数检查列表中是否存在：

pdf = pd.DataFrame(data=[[1,""],[2,""],[3,""],[4,""],[5,""]], columns=["A", "newColumn"])
pdf.loc[pdf["A"].isin([1,2,3]), "newColumn"] = "numberType"
print(pdf)

   A   newColumn
0  1  numberType
1  2  numberType
2  3  numberType
3  4            
4  5            

import pyspark.sql.functions as F
sdf = spark.createDataFrame(data=[[1,""],[2,""],[3,""],[4,""],[5,""]], schema=["A", "newColumn"])
sdf = sdf.withColumn("newColumn", F.when(F.col("A").isin([1,2,3]), F.lit("numberType")))
sdf.show()

+---+----------+
|  A| newColumn|
+---+----------+
|  1|numberType|
|  2|numberType|
|  3|numberType|
|  4|      null|
|  5|      null|
+---+----------+

赞(0）回复(0）举报 2022-11-21

我来回答

pyspark筛选列的值以分配新列

2条答案

相关问题

热门标签

最新问答