如何将文本文件转换为多列模式dataframe/dataset

ca1c2owp  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(439)

我正在尝试读取一个文本文件并将其转换为Dataframe。

val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
.map((row) => row.toString().split(","))
.map(attributes => {
 Row(attributes(0), attributes(1), attributes(2), attributes(3), attributes(4))
}).as[Row]

当我输入df.printschema时,我得到的是一个列;

root
 |-- value: binary (nullable = true)

如何将此文本文件转换为多列架构dataframe/dataset

muk1a3rh

muk1a3rh1#

解决了的;

val inputSchema: StructType = StructType(
  List(
    StructField("1", StringType, true),
    StructField("2", StringType, true),
    StructField("3", StringType, true),
    StructField("4", StringType, true),
    StructField("5", StringType, true)
  )
)

val encoder = RowEncoder(inputSchema)

  val inputDf: DataFrame = spark.read.text(filePath.get.concat("/").concat(fileName.get))
  .map((row) => row.toString().split(","))
  .map(attributes => {

    Row(attributes(0), attributes(1), attributes(2), attributes(3), "BUY")
  })(encoder)

相关问题