当我使用elasticsearch hadoop时，如何找到最好的分区号？

cwtwac6a 于 2021-05-27 发布在 Spark

关注(0)|答案(0)|浏览(305)

我在用 elasticsearch-hadoop:7.7.0 将我的数据从Hive写入es。但我发现当Dataframe的大小和分区太大时，会抛出一些异常：

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: es_rejected_execution_exception: rejected execution of processing of [2255855262][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[rs_test_index][0]] containing [1000] requests, target allocation id: 8nqY2sdDTDmZRKU-jPKcdQ, primary term: 1 on EsThreadPoolExecutor[name = es-node/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@11b7b906[Running, pool size = 48, active threads = 48, queued tasks = 220, completed tasks = 921584808]]

这可能是由es写入线程大小的配置引起的。然后我开始 es.batch.write.retry.count=1 可以确保 RDD 将重试由于上述异常而未能写入es的。为了提高效率，我扩大了分区数。但我还有一个例外：

Attempted to get executor loss reason for executor id 782 at RPC address XX.XX.XX.XX:47562, but got no response. Marking as slave lost.
java.io.IOException: Connection from /XX.XX.XX.XX:42842 closed

而spark应用程序在没有完成所有编写工作的情况下退出。
所以我想知道分区、日期框的大小和es的碎片和副本数之间是否有关系。如果是的话，我怎样才能找到最好的分区号来获得最好的效率呢

来源：https://stackoverflow.com/questions/63716632/how-can-i-find-the-best-partition-numbers-when-i-use-elasticsearch-hadoop

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

当我使用elasticsearch hadoop时，如何找到最好的分区号？

暂无答案！

相关问题

热门标签

最新问答