如何在ElasticSearch中真正对数据重新索引

kcrjzv8t 于 2022-09-20 发布在 ElasticSearch

关注(0)|答案(5)|浏览(232)

我已经添加了新的Map(主要是现有字段的非分析版本)，现在我必须弄清楚如何重新索引现有数据。我试着在ElasticSearch网站上遵循指南，但这太令人困惑了。我也尝试过使用插件(ElasticSearch-reindex，allegro/ElasticSearch-reindex-Tool)。我看过ElasticSearch - Reindexing your data with zero downtime，这是一个类似的问题。我希望不必依赖外部工具(如果可能)并尝试使用批量API(与原始插入一样)

我可以很容易地重建整个索引，因为它是一个真正的只读数据，但这不会真正在长期工作，如果我应该添加更多的字段等，当我与它生产。我想知道有没有人知道对于ES相对新手来说，有一个容易理解/遵循的解决方案或步骤。我使用的是版本2，使用的是Windows。

elasticsearch

来源：https://stackoverflow.com/questions/33858542/how-to-really-reindex-data-in-elasticsearch

5条答案

按热度按时间

bkhjykvo1#

重新索引是指读取数据，删除ElasticSearch中的数据，然后再次摄取数据。不存在“就地更改现有数据的Map”这样的事情。您提到的所有重建索引工具都只是对读取->删除->摄取的 Package 。
您可以随时调整新索引的Map，并在以后添加字段。所有新字段都将根据此Map进行索引。或者，如果您无法控制新字段，则使用动态Map。
看一看Change default mapping of string to "not analyzed" in Elasticsearch，看看如何使用动态Map来获取字符串的NOT_ANALYSED字段。

重建索引是非常昂贵的。更好的方法是创建一个新的索引并删除旧的索引。要在零停机时间内实现这一点，请对所有客户使用索引别名。想一想名为“data-version1”的索引。在步骤中：

创建索引“data-version1”，并为其指定别名“data”
仅在所有客户端应用程序中使用别名“data”
更新您的Map：创建一个名为“data-version2”的新索引(使用新Map)，并将所有数据放入其中(您可以使用_reindex API来实现)
从版本1切换到版本2：在版本1上删除别名“data”，在版本2上创建别名“data”(或先创建，然后删除)。在这两个步骤之间的时间，您的客户端将没有(或重复)数据。但删除和创建别名之间的时间应该很短，以至于客户不应该识别它。

总是使用别名是很好的做法。

赞(0）回复(0）举报 2022-09-20

eanckbw92#

在2.3.4版中，提供了一个新的API_reindex，它将完全按照它所说的做。基本用法是

{
    "source": {
        "index": "currentIndex"
    },
    "dest": {
        "index": "newIndex"
    }
}

赞(0）回复(0）举报 2022-09-20

rkttyhzu3#

从Remote主机到Local主机的Elasticearch重新索引示例(2020年1月更新)


# show indices on this host

curl 'localhost:9200/_cat/indices?v'

# edit elasticsearch configuration file to allow remote indexing

sudo vi /etc/elasticsearch/elasticsearch.yml

## copy the line below somewhere in the file

>>>

# --- whitelist for remote indexing ---

reindex.remote.whitelist: my-remote-machine.my-domain.com:9200
<<<

# restart elaticsearch service

sudo systemctl restart elasticsearch

# run reindex from remote machine to copy the index named filebeat-2016.12.01

curl -H 'Content-Type: application/json' -X POST 127.0.0.1:9200/_reindex?pretty -d'{
  "source": {
    "remote": {
      "host": "http://my-remote-machine.my-domain.com:9200"
    },
    "index": "filebeat-2016.12.01"
  },
  "dest": {
    "index": "filebeat-2016.12.01"
  }
}'

# verify index has been copied

curl 'localhost:9200/_cat/indices?v'

赞(0）回复(0）举报 2022-09-20

tkclm6bt4#

如果你想像我一样直接回答这个常见而基本的问题，而这个问题并没有被Elact和整个社区所解决，那么这里的代码对我来说是有效的。

假设您只是在调试，而不是在生产环境中，并且添加或删除字段是完全合法的，因为您完全不关心停机或延迟：


# First of all: enable blocks write to enable clonage

PUT /my_index/_settings
{
  "settings": {
    "index.blocks.write": true
  }
}

# clone index into a temporary index

POST /my_index/_clone/my_index-000001  

# Copy back all documents in the original index to force their reindexetion

POST /_reindex
{
  "source": {
    "index": "my_index-000001"
  },
  "dest": {
    "index": "my_index"
  }
}

# Disable blocks write

PUT /my_index/_settings
{
  "settings": {
    "index.blocks.write": false
  }
}

# Finaly delete the temporary index

DELETE my_index-000001

赞(0）回复(0）举报 2022-09-20

wqsoz72f5#

我也面临着同样的问题。但我找不到任何资源来更新当前的索引Map和分析器。我的建议是使用scroll and scan api，并使用新Map和新字段将数据重新索引到新索引。

赞(0）回复(0）举报 2022-09-20