使用特殊字符从txt创建表配置单元

7hiiyaii  于 2021-06-28  发布在  Hive
关注(0)|答案(1)|浏览(342)

嗨!
我有一个包含以下内容的txt:

$ hdfs dfs -cat result/
[5,AA,ABE,US,AGU,MX,DNE0M0Z1,99991231,20160421,MX13,706,1,,33,,BOX,,,60,INNJ,31,2419221]
[5,AA,ABE,US,AGU,MX,DNE0M0Z1,99991231,20160421,MX13,706,1,,33,,BOX,,,60,INNJ,31,2419244]
[5,AA,ABE,US,AGU,MX,DNE0M0Z1,99991231,20160421,MX13,706,1,,33,,BOX,,,60,INNJ,31,2419319]

这个文件是由spark在hdfs中生成的。我想要的是创建一个从该文件读取的表配置单元,以便在表中显示结果。问题是记录以[]开始和结束。因为我可以做到这一点,而不改变文本,因为它是自动生成的?
现在我的table是:

DROP TABLE IF EXISTS RESULT_LATAM;

CREATE EXTERNAL TABLE IF NOT EXISTS RESULT_LATAM
(
  FARDET_NUM_RULE_TARIFF      BIGINT,
  FARDET_CD_CARRIER           VARCHAR(3),
  FARDET_CD_ORIGIN_CITY       VARCHAR(5),
  FARDET_CD_ORIGIN_COUNTRY    VARCHAR(2),
  FARDET_CD_DEST_CITY         VARCHAR(5),
  FARDET_CD_DEST_COUNTRY      VARCHAR(2),
  FARDET_CD_FARE_BASIS        VARCHAR(8),
  .
  .
  .
 )
 STORED AS TEXTFILE
 LOCATION '/user/ubuntu/result/';
vlf7wbxs

vlf7wbxs1#

没有直接的方法来实现这一点,但为了演示解决方案,我使用了较少的列数,但您会得到一个想法。你必须养成习惯 EDW 在主表中加载数据并执行转换/清理的解决方案:
样本数据:

[5,A1]
[6,A2]
[7,A3]

创建临时表

create external table table_stg(x string,y string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n';

创建主表

create external table table_main(x int,y VARCHAR(10)) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n';

在暂存表中加载数据

LOAD DATA INPATH '/user/cloudera/result.txt' INTO TABLE table_stg;

hive> select * from table_stg;
OK
[5  A1]
[6  A2]
[7  A3]
Time taken: 0.086 seconds, Fetched: 3 row(s)

在主表中加载干净的数据

insert into table table_main 
select regexp_replace(x, '\\[',''), regexp_replace(y, '\\]','')
from table_stg;

最终输出

hive> select * from table_main;
OK
5   A1
6   A2
7   A3
Time taken: 0.155 seconds, Fetched: 3 row(s)

相关问题