我下载了stackoverflow用户转储,这样我就可以习惯hive了,并且我已经将xml转换成了csv文件。我正在使用以下命令:
add jar /home/cloudera/csv-serde.jar;
drop table stackoverflow_users;
CREATE external TABLE IF NOT EXISTS stackoverflow_users (CreationDate timestamp, Views BIGINT,
AccountId BIGINT, AboutMe string,
WebsiteUrl string, LastAccessDate timestamp, upvotes bigint,
ProfileImageUrl string, DisplayName string,
Id BigInt, Reputation BIGINT, DownVotes bigint,
Age int, Location String)
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
location '/user/cloudera/users';
文件行的格式如下:
"2008-08-01T12:09:11.010","1347","14","","http://some.url","2016-01-15T01:44:05.733","369","","User name","20","6943","38","","Some location"
"2008-08-01T12:11:11.897","830","15","","http://some.url","2016-06-11T01:38:09.770","191","","User name","22","8727","5","30","Some location"
但是,如果我 desc stackoverflow_users
,我看到以下内容:
+------------------+------------+--------------------+--+
| col_name | data_type | comment |
+------------------+------------+--------------------+--+
| creationdate | string | from deserializer |
| views | string | from deserializer |
| accountid | string | from deserializer |
| aboutme | string | from deserializer |
| websiteurl | string | from deserializer |
| lastaccessdate | string | from deserializer |
| upvotes | string | from deserializer |
| profileimageurl | string | from deserializer |
| displayname | string | from deserializer |
| id | string | from deserializer |
| reputation | string | from deserializer |
| downvotes | string | from deserializer |
| age | string | from deserializer |
| location | string | from deserializer |
+------------------+------------+--------------------+--+
为什么一切都是一根弦?
1条答案
按热度按时间carvr3hs1#
问题在于使用的serde。这里也有报道