pyspark 我们如何在spark中获得单个转换(如删除和重命名)的完成时间

ehxuflar  于 2023-02-18  发布在  Spark
关注(0)|答案(1)|浏览(154)
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import expr
#Create spark session
data = [(“Banana”,1000,“USA”,“1”), (“Carrots”,1500,“USA”,“2"), (“Beans”,1600,“USA”,“3”),
      (“Orange”,2000,“USA”,“4"),(“Banana”,400,“China”,“5”),
      (“Carrots”,1200,“China”,“1"),(“Beans”,1500,“China”,“2”),(“Orange”,4000,“China”,“3"),
      (“Banana”,2000,“Canada”,“4”),(“Carrots”,2000,“Canada”,“5"),(“Beans”,2000,“Mexico”,“6”),(“Orange”,2000,“USA”,“7")]
columns= [“Product”,“Amount”,“Country”,“Id”]
spark = SparkSession.builder.master(“local[*]“).getOrCreate()
df = spark.createDataFrame(data = data, schema = columns)
df=df.drop(“Id”)
df=df.withColumnRenamed(“Product”,“Veggies”)
df.write.csv(“Output.csv”)
df.show(truncate=False)

Spark中个体转化的预期时间间隔

edqdpe6u

edqdpe6u1#

您可以在Spark Master Web UI上获得Spark作业执行细节。否则,需要实现您自己的方法,简单的方法如下所示**(Java代码片段,根据您的要求修改Python)**,

long t0 = System.currentTimeMillis();
.
.
.
long t2 = System.currentTimeMillis();
.
.
.
long t3 = System.currentTimeMillis();
.
.
.
long t4 = System.currentTimeMillis();

System.out.println("1. Creating a session ........... " + (t1 - t0));
System.out.println("2. Loading initial dataset ...... " + (t2 - t1));
System.out.println("3. Transformations  ............. " + (t3 - t2));
System.out.println("4. Final action ................. " + (t4 - t3));

相关问题