用于orc或rc格式的配置单元json serde

pbwdgjma  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(358)

可以将json serde与rc或orc文件格式一起使用吗?我正在尝试插入一个文件格式为orc的配置单元表,并以序列化json的形式存储在azure blob上。

4c8rllxm

4c8rllxm1#

您可以使用某种转换步骤来实现这一点,比如bucketing步骤,它将在目标目录中生成orc文件,并在bucketing之后挂载具有相同模式的配置单元表。就像下面一样。

CREATE EXTERNAL TABLE my_fact_orc
(
  mycol STRING,
  mystring INT
)
PARTITIONED BY (dt string)
CLUSTERED BY (some_id) INTO 64 BUCKETS
STORED AS ORC
LOCATION 's3://dev/my_fact_orc'
TBLPROPERTIES ('orc.compress'='SNAPPY');

ALTER TABLE my_fact_orc ADD IF NOT EXISTS PARTITION (dt='2017-09-07') LOCATION 's3://dev/my_fact_orc/dt=2017-09-07';

ALTER TABLE my_fact_orc PARTITION (dt='2017-09-07') SET FILEFORMAT ORC;

SELECT * FROM my_fact_orc WHERE dt='2017-09-07' LIMIT 5;
vshtjzan

vshtjzan2#

显然不是

insert overwrite local directory '/home/cloudera/local/mytable' 
stored as orc 
select '{"mycol":123,"mystring","Hello"}'
;

create external table verify_data (rec string) 
stored as orc 
location 'file:////home/cloudera/local/mytable'
;

select * from verify_data
;

记录
{“mycol”:123,“mystring”,“hello”}

create external table mytable (myint int,mystring string)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' 
stored as orc
location 'file:///home/cloudera/local/mytable'
;

myint mystring
失败,出现异常java.io.ioexception:java.lang.classcastexception:
org.apache.hadoop.hive.ql.io.orc.orcstruct不能强制转换为org.apache.hadoop.io.text
jsonserde.java文件:

...
import org.apache.hadoop.io.Text;
...

  @Override
  public Object deserialize(Writable blob) throws SerDeException {

    Text t = (Text) blob;
  ...

相关问题