如何将具有不同分隔符的csv加载到单个hadoop表中

4zcjmb1e  于 2021-06-01  发布在  Hadoop
关注(0)|答案(1)|浏览(312)

我想用多个csv文件填充一个配置单元表。问题是不是所有的文件都有相同的分隔符。在创建表时,我只能指定一个分隔符~

  1. create table status (type string, ...)
  2. ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  3. with serdeproperties ("separatorChar" = "~")
  4. STORED AS TEXTFILE

配置单元是否有允许多个csv分隔符的内置功能?我知道这些文件可以在加载之前通过hadoop作业进行标准化,或者基于https://stackoverflow.com/a/26356592/2207078 我可以用Pig做它,但我正在寻找一些内置的功能。理想情况下,我希望创建没有指定分隔符的状态表,并指示配置单元在加载时如何分隔列。

kqhtkvqz

kqhtkvqz1#

演示

数据文件
逗号.txt

  1. |Now|,I've,heard,there,was
  2. a,secret,chord;,That,David
  3. played,||and||,it,,pleased
  4. the,,,Lord;,

分号.txt

  1. But;;you;don't;really
  2. |care|;for;music;do;||||| you |||||?

管道.txt

  1. ,It,|,goes,|,like,|,this,|,the,
  2. fourth|the|fifth|The|;minor n
  3. fall|the|;major|lift|The
  4. baffled|king||composing|hallelujah

ddl公司

  1. create external table mytable
  2. (c1 string,c2 string,c3 string,c4 string,c5 string)
  3. partitioned by (delim string)
  4. ;
  5. alter table mytable set serdeproperties ('field.delim'=',');
  6. alter table mytable add partition (delim='comma');
  7. alter table mytable set serdeproperties ('field.delim'=';');
  8. alter table mytable add partition (delim='semicolon');
  9. alter table mytable set serdeproperties ('field.delim'='|');
  10. alter table mytable add partition (delim='pipeline');

将文件放在匹配的目录中

  1. mytable
  2. ├── delim=comma
  3.    └── comma.txt
  4. ├── delim=pipeline
  5.    └── pipeline.txt
  6. └── delim=semicolon
  7. └── semicolon.txt
  1. select * from mytable
  2. ;
  1. +---------+---------+--------+-----------+------------------+-----------+
  2. | c1 | c2 | c3 | c4 | c5 | delim |
  3. +---------+---------+--------+-----------+------------------+-----------+
  4. | |Now| | I've | heard | there | was | comma |
  5. | a | secret | chord; | That | David | comma |
  6. | played | ||and|| | it | | pleased | comma |
  7. | the | | | Lord; | | comma |
  8. | But | | you | don't | really | semicolon |
  9. | |care| | for | music | do | ||||| you |||||? | semicolon |
  10. | ,It, | ,goes, | ,like, | ,this, | ,the, | pipeline |
  11. | fourth | the | fifth | The | ;minor | pipeline |
  12. | fall | the | ;major | lift | The | pipeline |
  13. | baffled | king | | composing | hallelujah | pipeline |
  14. +---------+---------+--------+-----------+------------------+-----------+
展开查看全部

相关问题