使用字符串数组在配置单元表上加载csv文件

li9yvcax  于 2021-06-02  发布在  Hadoop
关注(0)|答案(2)|浏览(296)

我正在尝试将一个csv文件插入到配置单元中,其中一个字段是字符串数组。
以下是csv文件:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,[Health&Fitness,Travel]
99,Snacks that Power Up Weight Loss,Aidan B. Prince,[Photo,Travel]

我试着创建这样的表:

CREATE TABLE IF NOT EXISTS Article
(
ARTICLE_ID INT,
ARTICLE_NSAME STRING,
ARTICLE_AUTHOR STRING,
ARTICLE_GENRE ARRAY<STRING>
);
LOAD DATA INPATH '/tmp/pinterest/article.csv' OVERWRITE INTO TABLE Article;
select * from Article;

下面是我得到的结果:

article.article_id  article.article_name    article.article_author  article.article_genre
48  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Health&Fitness"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Photo"]

它在最后一篇文章中只取一个值。
有人能指出这里出了什么问题吗?

nnvyjq4y

nnvyjq4y1#

为了在配置单元表中插入字符串数组,我们需要注意以下几点。

1. While creating Hive table.Collection items should be terminated by "," ('colelction.delim'=',',)
 2. Data should be like that in CSV file
  48  Snacks that Power Up Weight Loss    Aidan B. Prince Health&Fitness,Travel
You can modify file  by running below SED commands in follwing order:
 - sed -i 's/\[\"//g' filename
 - sed -i 's/\"\]//g' filename
 - sed -i 's/"//g' filename
sg24os4d

sg24os4d2#

两件事:
缺少集合项的分隔符定义。
另外,我想你应该 you select * from article 返回语句如下:

48  Snacks that Power Up Weight Loss    Aidan B. Prince ["Health&Fitness","Travel"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["Photo","Travel"]

我可以给你举个例子,其余的你可以随便摆弄。以下是我的表定义:

create table article (
  id int,
  name string,
  author string,
  genre array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '|';

数据如下:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,Health&Fitness|Travel
99,Snacks that Power Up Weight Loss,Aidan B. Prince,Photo|Travel

现在进行如下操作: LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; 并执行select语句来检查结果。
最重要的一点:
为集合项定义分隔符,不要强制使用常规编程中使用的数组结构。
另外,尝试使字段分隔符与集合项分隔符不同,以避免混淆和意外结果。

相关问题