CREATE EXTERNAL TABLE my_fact_orc
(
mycol STRING,
mystring INT
)
PARTITIONED BY (dt string)
CLUSTERED BY (some_id) INTO 64 BUCKETS
STORED AS ORC
LOCATION 's3://dev/my_fact_orc'
TBLPROPERTIES ('orc.compress'='SNAPPY');
ALTER TABLE my_fact_orc ADD IF NOT EXISTS PARTITION (dt='2017-09-07') LOCATION 's3://dev/my_fact_orc/dt=2017-09-07';
ALTER TABLE my_fact_orc PARTITION (dt='2017-09-07') SET FILEFORMAT ORC;
SELECT * FROM my_fact_orc WHERE dt='2017-09-07' LIMIT 5;
2条答案
按热度按时间4c8rllxm1#
您可以使用某种转换步骤来实现这一点,比如bucketing步骤,它将在目标目录中生成orc文件,并在bucketing之后挂载具有相同模式的配置单元表。就像下面一样。
vshtjzan2#
显然不是
记录
{“mycol”:123,“mystring”,“hello”}
myint mystring
失败,出现异常java.io.ioexception:java.lang.classcastexception:
org.apache.hadoop.hive.ql.io.orc.orcstruct不能强制转换为org.apache.hadoop.io.text
jsonserde.java文件: