我的table是,
CREATE EXTERNAL TABLE twitter.tweets (id BIGINT,created_at STRING,source STRING,favorited BOOLEAN, retweeted_status STRUCT<text:STRING,user:STRUCT<screen_name:STRING,name:STRING>,retweet_count:INT>,entities STRUCT<urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,text STRING,user STRUCT<screen_name:STRING,name:STRING,friends_count:INT,followers_count:INT,statuses_count:INT,verified:BOOLEAN,utc_offset:INT,time_zone:STRING>,in_reply_to_screen_name STRING)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/twitter';
我使用 LOAD DATA INPATH '/user/hue/twitter/tweets/2017/03/08/FlumeData.1489005910193' OVERWRITE INTO TABLE tweets PARTITION (datehour)
我得到一个错误
'编译语句时出错:failed:semanticexception org.apache.hadoop.hive.ql.metadata.hiveexception:元异常(message:invalid partition 关键字和值;键[datehour,],值[])'
我不明白分区部分的值应该写什么
1条答案
按热度按时间62o28rlo1#
LOAD DATA INPATH
只移动文件。如果您的所有记录都来自同一时间(例如。
23
)然后使用-...INTO TABLE tweets PARTITION (datehour=23)
.否则,您将不得不使用另一种技术,例如外部表。