pyspark：有没有可能基于非空值创建动态数目的Dataframe

kyxcudwk 于 2021-07-13 发布在 Spark

关注(0)|答案(2)|浏览(235)

我有一个PyparkDataframe：
NameAgeUserNamePasswordJoe34NullAlice21NullUser1Pass1NullUser2Pass2
从上面的dataframe中，我想通过查找空值列来创建2个这样的dataframe，不知何故：
姓名Joe34Alica21
用户名密码User1Pass1User2Pass2
有没有办法做到这一点？
“source”目录下的json文件示例：

{
 "name": "joe",
 "age": 31
}

{
 "name": "alica",
 "age": 21
}

{
 "username": "user1",
 "password": "pass1"
}

{
 "username": "user2",
 "password": "pass2
}

代码：

conf = SparkConf().setMaster("local").setAppName("Test")
spark = SparkSession \
        .builder \
        .config(conf=conf) \
        .getOrCreate()

json_data = spark.read.json("source")

apache-spark pyspark apache-spark-sql pyspark-dataframes

来源：https://stackoverflow.com/questions/66206182/pyspark-is-there-any-possibility-of-creating-dynamic-number-of-dataframe-based

2条答案

按热度按时间

rm5edbpk1#

如果你总是有相同的固定数量的列，我只会涵盖所有的情况

import pyspark.sql.functions as f

df2=df.where(f.col("name").isNotNull() & f.col("age").isNotNull() & f.col("username").isNotNull() & f.col("password").isNull())

df3=df.where(f.col("name").isNotNull() & f.col("age").isNotNull() & f.col("username").isNull() & f.col("password").isNull())

df3=df.where(f.col("name").isNotNull() & f.col("age").isNull() & f.col("username").isNull() & f.col("password").isNull())

df4=df.where(f.col("name").isNull() & f.col("age").isNotNull() & f.col("username").isNotNull() & f.col("password").isNotNull())

df5=df.where(f.col("name").isNull() & f.col("age").isNull() & f.col("username").isNotNull() & f.col("password").isNotNull())

... and so on

赞(0）回复(0）举报 2021-07-13

h79rfbju2#

你可以简单地使用 select + dropna() :

df1 = df.select("name", "age").dropna()

df1.show()

# +-----+---+

# | name|age|

# +-----+---+

# |  joe| 34|

# |alice| 21|

# +-----+---+

df2 = df.select("username", "password").dropna()

df2.show()

# +--------+--------+

# |username|password|

# +--------+--------+

# |   user1|   pass1|

# |   user2|   pass2|

# +--------+--------+

赞(0）回复(0）举报 2021-07-13

我来回答

pyspark：有没有可能基于非空值创建动态数目的Dataframe

2条答案

相关问题

热门标签

最新问答