我尝试在Spark Connect上使用Spark PANDAS API,但出现Assert错误
assert isinstance(spark_frame, SparkDataFrame)
AssertionError
字符串
如果我使用spark Dataframe API,我不会得到任何错误。Spark connect支持Pandas-Spark API吗?
下面是我正在运行的代码。
import pyspark.pandas as pd
from pyspark.sql import Row
# Stopping regular Spark Session before trying the SPARK Connect Functionality
from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
# Start the spark connect server running below
#./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
# Start Spark Session by Specifying the Spark Cluster Address ( local host.)
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
print(df.head())
import pyspark.pandas as pd
from pyspark.sql import Row
# Stopping regular Spark Session before trying the SPARK Connect Functionality
from pyspark.sql import SparkSession
SparkSession.builder.master("local[*]").getOrCreate().stop()
# Start the spark connect server running below
#./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
# Start Spark Session by Specifying the Spark Cluster Address ( local host.)
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(d)
print(df.head())
'''
df = spark.createDataFrame([
Row(a=1, b=2., c='string1'),
Row(a=2, b=3., c='string2'),
Row(a=4, b=5., c='string3')
])
df.show()
'''
型
1条答案
按热度按时间o2rvlv0m1#
下面是代码的更正版本
字符串
注意:在尝试远程连接之前,请确保您的Spark集群和Spark Connect服务器已正确配置并运行。