我发现很难将Parquet地板文件装入Hive桌。我正在amazon emr cluster和spark上进行数据处理。但是我需要读取输出的Parquet文件来验证我的转换。我有以下模式的Parquet文件:
root
|-- ATTR_YEAR: long (nullable = true)
|-- afil: struct (nullable = true)
| |-- clm: struct (nullable = true)
| | |-- amb: struct (nullable = true)
| | | |-- L: string (nullable = true)
| | | |-- cdTransRsn: string (nullable = true)
| | | |-- dist: struct (nullable = true)
| | | | |-- T: string (nullable = true)
| | | | |-- content: double (nullable = true)
| | | |-- dscStrchPurp: string (nullable = true)
| | |-- amt: struct (nullable = true)
| | | |-- L: string (nullable = true)
| | | |-- T: string (nullable = true)
| | | |-- content: double (nullable = true)
| | |-- amtTotChrg: double (nullable = true)
| | |-- cdAccState: string (nullable = true)
| | |-- cdCause: string (nullable = true)
如何使用这种模式创建配置单元外部表,并将Parquet文件加载到该配置单元表中进行分析?
1条答案
按热度按时间p4tfgftt1#
你可以用
Catalog.createExternalTable
(2.2之前的Spark)或Catalog.createTable
(spark 2.2及更高版本)。Catalog
可以使用访问示例SparkSession
:应在启用配置单元支持的情况下初始化会话。