如何使用listofdata和schema创建sparkDataframe

mepcadol  于 2021-05-24  发布在  Spark
关注(0)|答案(1)|浏览(449)

我试图从数据列表中创建一个dataframe,并希望对其应用schema。在sparkscala文档中,我尝试使用这个createdataframe签名,它接受行列表和模式作为structtype。 def createDataFrame(rows: List[Row], schema: StructType): DataFrame 下面是我正在尝试的示例代码

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val simpleData = List(Row("James", "Sales", 3000),
  Row("Michael", "Sales", 4600),
  Row("Robert", "Sales", 4100),
  Row("Maria", "Finance", 3000)
)

val schema = StructType(Array(
StructField("name",StringType,false),
StructField("department",StringType,false),
StructField("salary",IntegerType,false)))

val df = spark.createDataFrame(simpleData,schema)

但我的错误率越来越低

command-3391230614683259:15: error: overloaded method value createDataFrame with alternatives:
  (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
 cannot be applied to (List[org.apache.spark.sql.Row], org.apache.spark.sql.types.StructType)
val df = spark.createDataFrame(simpleData,schema)

请说明我做错了什么。

bzzcjhmw

bzzcjhmw1#

错误告诉您它需要java列表而不是scala列表:

import scala.jdk.CollectionConverters._

val df = spark.createDataFrame(simpleData.asJava, schema)

请参见此问题,以了解 CollectionConverters 如果您使用的是早于2.13的scala版本。
另一种选择是传递rdd:

val df = spark.createDataFram(sc.parallelize(simpleData), schema)
``` `sc` 作为sparkcontext对象。

相关问题