我正在尝试使用配置单元查询将数据上传到aws elasticsearch。配置单元查询中使用的中间表具有以下结构。
CREATE TABLE uid_taxonomy_1_day_es(
UID String
,VISITS INT
,TAXONOMY_LABEL_1 INT
,TAXONOMY_LABEL_2 INT
,TAXONOMY_LABEL_3 INT
,DAY_OF_YEAR INT)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes' = 'https://search-cust-360-view-
6cj2j4mcytbtrpodzffnh3uadi.us-east-1.es.amazonaws.com', 'es.port' = '443',
'es.index.auto.create' = 'false', 'es.batch.size.entries' = '1000',
'es.batch.write.retry.count' = '10000', 'es.batch.write.retry.wait' = '10s',
'es.batch.write.refresh' = 'false','es.nodes.discovery' =
'false','es.nodes.client.only' = 'false', 'es.resource' =
'urltaxonomy/uids', 'es.query' = '?q=*', 'es.nodes.wan.only' = 'true');
实际的加载查询需要大约半小时的时间来加载数据。所以在随机记录之间,它会抛出403个禁止的错误。
这是stacktrace。
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [HEAD] on [urltaxonomy] failed; server[https://search-cust-360-view-6cj2j4mcytbtrpodzffnh3uadi.us-east-1.es.amazonaws.com:443] returned [403|Forbidden:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:505)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:476)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:537)
at org.elasticsearch.hadoop.rest.RestClient.touch(RestClient.java:543)
at org.elasticsearch.hadoop.rest.RestRepository.touch(RestRepository.java:412)
at org.elasticsearch.hadoop.rest.RestService.initSingleIndex(RestService.java:606)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:594)
at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:173)
错误只是间歇性地发生,也就是说,如果我尝试插入失败的记录,它就可以正常工作。
我想这和我的中间表结构有关。
暂无答案!
目前还没有任何答案,快来回答吧!