在hive中，当我从csv文件加载数据时，我只得到一部分列，而不是全部

h43kikqp 于 2021-06-26 发布在 Hive

关注(0)|答案(2)|浏览(515)

以下是我的数据源中的列

BibNum  
Title   
Author  
ISBN    
PublicationYear 
Publisher   
Subjects    
ItemType    
ItemCollection  
FloatingItem    
ItemLocation    
ReportDate  
ItemCount

我只得到了 publisher . 我上传了一个截图，如果你知道原因和方法可以修复，请让我知道，我会非常感激：

下面是第一行的实际值（我用//mark分隔以表示每一列）

3011076// 
A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam.     // 
O'Ryan, Ellie   // 
1481425730, 1481425749, 9781481425735, 9781481425742    // 
2014    // 
Simon Spotlight,    Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction    // 
jcbk    // 
ncrdr   // 
Floating // 
qna  // 
09/01/2017 //   
1

这是第二行的实际值

2248846 //  
Naruto. Vol. 1, Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]. // 
Kishimoto, Masashi, 1974- //    
1569319006  // 
2003, c1999.    // 
Viz,    Ninja Japan Comic books strips etc, Comic books strips etc Japan Translations into English, Graphic novels //   
acbk//  
nycomic//   
NA//    
lcy//   
09/01/2017//    
1
hive> select * from timesheet limit 3;
OK
NULL    Title   Author  ISBN    PublicationYear Publisher   Subjects    ItemType    ItemCollection  FloatingItem    ItemLocation    ReportDate  ItemCount
3011076 "A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield Frederick Gardner    Megan Petasky   and Allen Tam."    "O'Ryan  Ellie" "1481425730  1481425749  9781481425735   9781481425742" 2014.   "Simon Spotlight
2248846 "Naruto. Vol. 1  Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]."   "Kishimoto   Masashi     1974-" 1569319006  "2003    c1999."    "Viz    "   "Ninja Japan Comic books strips etc  Comic books strips etc Japan Translations into English
Time taken: 0.149 seconds
hive> desc timesheet
    > ;
OK
bibnum  bigint  
title   string  
author  string  
isbn    string  
publication string  
publisher   string  
subjects    string  
itemtype    string  
itemcollection  string  
floatingitem    string  
itemlocation    string  
reportdate  string  
itemcount   string  
Time taken: 0.21 seconds

bibnum，title，author，isbn，publicationyear，publisher，subjects，itemtype，itemcollection，floatingitem，itemlocation，reportdate，itemcount | null | null | null | null | null | null | null | null | null | null |
3011076，“两个朋友的故事/改编自ellie o'ryan；由tom caulfield、frederick gardner、megan petasky和allen tam插图。“o'ryan，ellie”，“1481425730、1481425749、9781481425735、9781481425742”，2014.，“simon spotlight”，“音乐家小说、斗牛士小说、好友小说、友谊小说、冒险者和冒险家小说”，jcbk、ncrdr、浮动，qna，2017年1月9日，1 |空|空|空|空|空|空|空|空|空|空|

Hive loading import-from-csv

来源：https://stackoverflow.com/questions/49702028/in-hive-when-i-load-data-from-csv-file-i-only-get-a-part-of-columns-not-the-w

2条答案

按热度按时间

voj3qocg1#

由于csv文件由逗号分隔，因此如果将列指定为字符串，则整行将加载到该列中。因此，在创建表时，可以指定行值由逗号分隔。

create table table_name (
....
) row format delimited fields terminated by ',' lines terminated by '\n';

然后使用加载csv文件

load data local inpath path_to_file to table table_name;

希望这有帮助：）

赞(0）回复(0）举报 2021-06-26

1rhkuytd2#

因此，apachehive本身不能处理像csv这样的数据，但是通过serde（序列化器/反序列化器），它可以帮助处理这些数据
在hivev0.14+中，serde是内置的，默认的分隔符是 , 所以对于你的csv来说这应该管用

create table table_name(column names data types..) 
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
stored as textfile; 
and load data inpath '/path/' 
into table table_name

如果在任何一列中都有未转义的引号，你就必须手动进入并计算出哪一列是哪一列。。。

赞(0）回复(0）举报 2021-06-26

我来回答

在hive中，当我从csv文件加载数据时，我只得到一部分列，而不是全部

2条答案

相关问题

热门标签

最新问答