scala—如何使用spark中的elasticsearch hadoop将数据从一个elasticsearch群集重新索引到另一个群集

nwlqm0z1 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(504)

我有两个分开的elasticsearch群集，我想将数据从第一个群集重新索引到第二个群集，但是我发现我只能在sparkcontext配置中设置一个elasticsearch群集，例如：

var sparkConf : SparkConf = new SparkConf()
                     .setAppName("EsReIndex")
sparkConf.set("es.nodes", "node1.cluster1:9200")

那么，如何在同一个应用程序内的spark中使用elasticsearch hadoop在两个elasticsearch集群之间移动数据呢？

scala elasticsearch apache-spark apache-spark-sql elasticsearch-hadoop

来源：https://stackoverflow.com/questions/40315512/how-to-reindex-data-from-one-elasticsearch-cluster-to-another-with-elasticsearch

1条答案

按热度按时间

wtzytmuj1#

您不需要为此在sparkconf中配置节点地址。
使用dataframewriter时 elasticsearch 格式，可以将节点地址作为选项传递，如下所示：

val df = sqlContext.read
                  .format("elasticsearch")
                  .option("es.nodes", "node1.cluster1:9200")
                  .load("your_index/your_type")

df.write
    .option("es.nodes", "node2.cluster2:9200")
    .save("your_new_index/your_new_type")

这应该适用于spark 1.6.x和相应的elasticsearch hadoop连接器。

赞(0）回复(0）举报 2021-05-27

我来回答

scala—如何使用spark中的elasticsearch hadoop将数据从一个elasticsearch群集重新索引到另一个群集

1条答案

相关问题

热门标签

最新问答