在hive中,当我从csv文件加载数据时,我只得到一部分列,而不是全部

h43kikqp  于 2021-06-26  发布在  Hive
关注(0)|答案(2)|浏览(515)

以下是我的数据源中的列

  1. BibNum
  2. Title
  3. Author
  4. ISBN
  5. PublicationYear
  6. Publisher
  7. Subjects
  8. ItemType
  9. ItemCollection
  10. FloatingItem
  11. ItemLocation
  12. ReportDate
  13. ItemCount

我只得到了 publisher . 我上传了一个截图,如果你知道原因和方法可以修复,请让我知道,我会非常感激:

下面是第一行的实际值(我用//mark分隔以表示每一列)

  1. 3011076//
  2. A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield, Frederick Gardner, Megan Petasky, and Allen Tam. //
  3. O'Ryan, Ellie //
  4. 1481425730, 1481425749, 9781481425735, 9781481425742 //
  5. 2014 //
  6. Simon Spotlight, Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction //
  7. jcbk //
  8. ncrdr //
  9. Floating //
  10. qna //
  11. 09/01/2017 //
  12. 1

这是第二行的实际值

  1. 2248846 //
  2. Naruto. Vol. 1, Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]. //
  3. Kishimoto, Masashi, 1974- //
  4. 1569319006 //
  5. 2003, c1999. //
  6. Viz, Ninja Japan Comic books strips etc, Comic books strips etc Japan Translations into English, Graphic novels //
  7. acbk//
  8. nycomic//
  9. NA//
  10. lcy//
  11. 09/01/2017//
  12. 1
  13. hive> select * from timesheet limit 3;
  14. OK
  15. NULL Title Author ISBN PublicationYear Publisher Subjects ItemType ItemCollection FloatingItem ItemLocation ReportDate ItemCount
  16. 3011076 "A tale of two friends / adapted by Ellie O'Ryan ; illustrated by Tom Caulfield Frederick Gardner Megan Petasky and Allen Tam." "O'Ryan Ellie" "1481425730 1481425749 9781481425735 9781481425742" 2014. "Simon Spotlight
  17. 2248846 "Naruto. Vol. 1 Uzumaki Naruto / story and art by Masashi Kishimoto ; [English adaptation by Jo Duffy]." "Kishimoto Masashi 1974-" 1569319006 "2003 c1999." "Viz " "Ninja Japan Comic books strips etc Comic books strips etc Japan Translations into English
  18. Time taken: 0.149 seconds
  19. hive> desc timesheet
  20. > ;
  21. OK
  22. bibnum bigint
  23. title string
  24. author string
  25. isbn string
  26. publication string
  27. publisher string
  28. subjects string
  29. itemtype string
  30. itemcollection string
  31. floatingitem string
  32. itemlocation string
  33. reportdate string
  34. itemcount string
  35. Time taken: 0.21 seconds

bibnum,title,author,isbn,publicationyear,publisher,subjects,itemtype,itemcollection,floatingitem,itemlocation,reportdate,itemcount | null | null | null | null | null | null | null | null | null | null |
3011076,“两个朋友的故事/改编自ellie o'ryan;由tom caulfield、frederick gardner、megan petasky和allen tam插图。“o'ryan,ellie”,“1481425730、1481425749、9781481425735、9781481425742”,2014.,“simon spotlight”,“音乐家小说、斗牛士小说、好友小说、友谊小说、冒险者和冒险家小说”,jcbk、ncrdr、浮动,qna,2017年1月9日,1 |空|空|空|空|空|空|空|空|空|空|

voj3qocg

voj3qocg1#

由于csv文件由逗号分隔,因此如果将列指定为字符串,则整行将加载到该列中。因此,在创建表时,可以指定行值由逗号分隔。

  1. create table table_name (
  2. ....
  3. ) row format delimited fields terminated by ',' lines terminated by '\n';

然后使用加载csv文件

  1. load data local inpath path_to_file to table table_name;

希望这有帮助:)

1rhkuytd

1rhkuytd2#

因此,apachehive本身不能处理像csv这样的数据,但是通过serde(序列化器/反序列化器),它可以帮助处理这些数据
在hivev0.14+中,serde是内置的,默认的分隔符是 , 所以对于你的csv来说这应该管用

  1. create table table_name(column names data types..)
  2. row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  3. stored as textfile;
  4. and load data inpath '/path/'
  5. into table table_name

如果在任何一列中都有未转义的引号,你就必须手动进入并计算出哪一列是哪一列。。。

相关问题