dropmalformed在apache spark中未返回正确的结果

wn9m85ua  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(479)

我需要将文本文件加载到spark数据框中,在这里我尝试使用跳过页眉和页脚 DROPMALFORMED 模式,但它不被尊重。
代码:

val df1 = sparkSession.read.format("csv")
  .option("header", “true”) 
  .option("mode", "DROPMALFORMED") 
  .option("delimiter",";")
  .load(“/xxx/xxx/xxxx/test.txt")
  //df1.show(false)

文件 test.txt 内容:

04/11/2020

name;age;id
asdildsh;12;1
ram;13;2
oma;23;3
radahea;14;4
hellohow

期望输出:

+--------+----+---+
|name    |age |id |
+--------+----+---+
|asdildsh| 12 | 1 |
|ram     | 13 | 2 |
|oma     | 23 | 3 |
|radahea | 14 | 4 |
+--------+----+---+
polkgigr

polkgigr1#

也许这是有用的-

val path  = getClass.getResource("/header_footer_file.txt").getPath
    /**
      * File content - header_footer_file.txt
      * ---------------------------------------
      * 04/11/2020
      *
      * name;age;id
      * asdildsh;12;1
      * ram;13;2
      * oma;23;3
      * radahea;14;4
      * hellohow
      */
    val stringDS = spark.read.text(path).as(Encoders.STRING)
        .filter(s => s.contains(";"))
     stringDS.show(false)
    /**
      * +-------------+
      * |value        |
      * +-------------+
      * |name;age;id  |
      * |asdildsh;12;1|
      * |ram;13;2     |
      * |oma;23;3     |
      * |radahea;14;4 |
      * +-------------+
      */
val df = spark.read
      .option("sep", ";")
      .option("inferSchema", "true")
      .option("header", "true")
      .option("nullValue", "null")
      .csv(stringDS)

    df.show(false)
    df.printSchema()

    /**
      * +--------+---+---+
      * |name    |age|id |
      * +--------+---+---+
      * |asdildsh|12 |1  |
      * |ram     |13 |2  |
      * |oma     |23 |3  |
      * |radahea |14 |4  |
      * +--------+---+---+
      *
      * root
      * |-- name: string (nullable = true)
      * |-- age: integer (nullable = true)
      * |-- id: integer (nullable = true)
      */

相关问题