import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import expr
#Create spark session
data = [(“Banana”,1000,“USA”,“1”), (“Carrots”,1500,“USA”,“2"), (“Beans”,1600,“USA”,“3”),
(“Orange”,2000,“USA”,“4"),(“Banana”,400,“China”,“5”),
(“Carrots”,1200,“China”,“1"),(“Beans”,1500,“China”,“2”),(“Orange”,4000,“China”,“3"),
(“Banana”,2000,“Canada”,“4”),(“Carrots”,2000,“Canada”,“5"),(“Beans”,2000,“Mexico”,“6”),(“Orange”,2000,“USA”,“7")]
columns= [“Product”,“Amount”,“Country”,“Id”]
spark = SparkSession.builder.master(“local[*]“).getOrCreate()
df = spark.createDataFrame(data = data, schema = columns)
df=df.drop(“Id”)
df=df.withColumnRenamed(“Product”,“Veggies”)
df.write.csv(“Output.csv”)
df.show(truncate=False)
Spark中个体转化的预期时间间隔
1条答案
按热度按时间edqdpe6u1#
您可以在Spark Master Web UI上获得Spark作业执行细节。否则,需要实现您自己的方法,简单的方法如下所示**(Java代码片段,根据您的要求修改Python)**,