json模式显示目录名和文件模式

ttcibm8c 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(305)

我有一个类似这样的目录，其中包含json文件

/user/myuser/check/database=helloworld/table=program/proc_dt=2017-04-04/part-00000

json文件架构的内容如下：

hadoop fs -cat /user/myuser/check/database=helloworld/table=program/proc_dt=2017-04-04/part-00000

{“作业类型”：“生产者”，“人员id”：“7d422349554”，“订单”：“1”，“实体id”：“123”}{“作业类型”：“生产者”，“人员id”：“af7dc39bc”，“订单”：“3”，“实体id”：“f2323”}
当我尝试使用下面的命令从json文件中读取模式时，我也得到了模式中的目录名。

import scala.collection.mutable.ArrayBuffer
var flattenedDatasetPath = "/user/myuser/check/database=helloworld/table=program/proc_dt=2017-04-04/"
var flattenedFileSchemaList = ArrayBuffer[String]()
val flattenedDataSetDF = sqlContext.read.json(flattenedDatasetPath)
var fieldNamesArr=flattenedDataSetDF.schema.fields
for(f<-fieldNamesArr){
    println(f.name)
    flattenedFileSchemaList+=f.name
}

这是我得到的输出

entity_id
job_type
order
person_id
database
table
proc_dt

为什么目录名作为模式的一部分出现？

Hive scala apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/42445180/json-schema-showing-directory-names-along-with-file-schema

1条答案

按热度按时间

rbpvctlc1#

是因为 spark.sql.sources.partitionColumnTypeInference.enabled 设置为 true 默认情况下，在spark中。

sqlContext.setConf("spark.sql.sources.partitionColumnTypeInference.enabled", "false")

分区列的数据类型是自动推断的。目前，支持数字数据类型和字符串类型。有时用户可能不希望自动推断分区列的数据类型。对于这些用例，可以通过spark.sql.sources.partitioncolumntypeinference.enabled配置自动类型推断，默认为true。当类型推断被禁用时，字符串类型将用于分区列。
Apache文档

赞(0）回复(0）举报 2021-06-26

我来回答

json模式显示目录名和文件模式

1条答案

相关问题

热门标签

最新问答