在pyspark中添加一个包含范围内随机数的列

xxhby3vn 于 2023-05-16 发布在 Spark

关注(0)|答案(2)|浏览(135)

我想用随机数生成一个列，如下所示：

df=df.withColumn("random_col",random.randint(100000, 1000000))

上面给了我一个错误：
Assert错误：col应为列

pyspark

来源：https://stackoverflow.com/questions/61686003/adding-a-column-with-random-numbers-within-a-range-in-pyspark

2条答案

按热度按时间

5anewei61#

首先，我会确保你输入了正确的东西...
尝试导入：从pyspark.sql.functions导入兰德
然后尝试类似这行代码的东西：
df1 = df.withColumn（“random_col”，rand（）> 100000，1000000）
You also could check out this resource. It looks like it may be helpful for what you are doing
希望这有帮助！

赞(0）回复(0）举报 2023-05-16

8nuwlpux2#

遇到这个问题，找不到任何具体的东西，最终弄明白了，希望这有助于任何人卡住：

# To add a column with values from a range of random values first create the column in a new Spark dataframe.

# import libraries
import random
from pyspark.sql import functions as F
from pyspark.sql.types import IntegerType, StringType, StructField, StructType

# Define new df schema
schema = StructType(
[
   StructField("id", StringType(), nullabe=False),
   StructField("random_value", IntegerType(), nullabe=False)
]

# create empty list
data = list()

for i in range(0, 200):  # adjust values as you wish
      data.append(
            {
                 "random_value": random.randint(500, 10000)  # adjust values as you wish
            }
       )

# Create the Spark dataframe
df = spark.createDataFrame(data, schema)

# Add id ordering
df1 = df.withColumn("id", F.monotonically_increasing_id())

然后，您需要在其他 Dataframe 上添加另一个id列，以连接相应的id列并附加“random_value”列。有关在预先存在的 Dataframe 上创建id列并连接的更多信息，请参阅this great example。

赞(0）回复(0）举报 2023-05-16

我来回答

在pyspark中添加一个包含范围内随机数的列

2条答案

相关问题

热门标签

最新问答