# To add a column with values from a range of random values first create the column in a new Spark dataframe.
# import libraries
import random
from pyspark.sql import functions as F
from pyspark.sql.types import IntegerType, StringType, StructField, StructType
# Define new df schema
schema = StructType(
[
StructField("id", StringType(), nullabe=False),
StructField("random_value", IntegerType(), nullabe=False)
]
# create empty list
data = list()
for i in range(0, 200): # adjust values as you wish
data.append(
{
"random_value": random.randint(500, 10000) # adjust values as you wish
}
)
# Create the Spark dataframe
df = spark.createDataFrame(data, schema)
# Add id ordering
df1 = df.withColumn("id", F.monotonically_increasing_id())
然后,您需要在其他 Dataframe 上添加另一个id列,以连接相应的id列并附加“random_value”列。有关在预先存在的 Dataframe 上创建id列并连接的更多信息,请参阅this great example。
2条答案
按热度按时间5anewei61#
首先,我会确保你输入了正确的东西...
尝试导入:从pyspark.sql.functions导入兰德
然后尝试类似这行代码的东西:
df1 = df.withColumn(“random_col”,rand()> 100000,1000000)
You also could check out this resource. It looks like it may be helpful for what you are doing
希望这有帮助!
8nuwlpux2#
遇到这个问题,找不到任何具体的东西,最终弄明白了,希望这有助于任何人卡住: