Kibana 如何计算元数据并将其添加到现有的Elasticsearch索引？

bfhwhh0e 于 2022-12-09 发布在 Kibana

关注(0)|答案(1)|浏览(200)

我将超过3800万个文档（文本字符串）加载到我本地机器上的Elasticsearch索引中。我希望计算每个字符串的长度，并将该值作为 meta数据添加到索引中。
我是否应该在将文档加载到Elasticsearch之前计算字符串长度作为 meta数据？或者，我是否可以在加载之后用计算值更新元数据？
我对Elasticsearch/Kibana比较陌生，这些问题是因为下面的Python实验而产生的：
1.字符串列表形式的数据

mylist = ['string_1', 'string_2',..., 'string_N']
 L = [len(s) for s in mylist]  # this computation takes about 1 minute on my machine

选项1的缺点是我没有利用Elasticsearch，并且'mylist'占用了很大的内存块。
1.作为Elasticsearch索引的数据，其中'mylist'中的每个字符串都被加载到字段'text'中。

from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
 document_store = ElasticsearchDocumentStore(host='localhost', username='', password='', index='myindex')
 docs = document_store.get_all_documents_generator()
 L = [len(d.text) for d in docs]  # this computation takes about 6 minutes on my machine

选项2的缺点是计算时间长，优点是generator（）释放了内存，计算时间长，这就是为什么我认为将字符串长度（和其他分析）作为 meta数据存储在Elasticsearch中是一个好的解决方案。
还有其他的选择吗？我错过了什么？

kibana

来源：https://stackoverflow.com/questions/69933708/how-do-i-compute-and-add-meta-data-to-an-existing-elasticsearch-index

1条答案

按热度按时间

b1payxdu1#

如果您想存储整个文档的大小，我建议安装mapper-size plugin，它将在_size字段中存储源文档的大小。
如果只想存储源文档的特定字段的大小，则需要采取不同的操作。
我的建议是创建一个ingest pipeline，在索引之前处理每个文档，然后在第一次索引文档时或加载文档后都可以使用这个摄取管道，我将向您展示如何操作。
首先，使用script processor创建摄取管道，它将把text字段中的字符串大小存储在另一个名为textLength的字段中。

PUT _ingest/pipeline/string-length
{
  "description": "My optional pipeline description",
  "processors": [
    {
      "script": {
        "source": "ctx.textLength = ctx.text.length()"
      }
    }
  ]
}

因此，如果您已经将文档加载到Elasticsearch中，并希望使用其中一个字段的长度来丰富每个文档，则可以在事后使用Update by Query API来完成，如下所示：

POST myindex/_update_by_query?pipeline=string-length&wait_for_completion=false

当文档第一次被索引时，也可以在索引时利用摄取管道，只需在索引查询中引用管道，如下所示：

PUT myindex/_doc/123?pipeline=string-length

这两种选择都可以，试一试，然后选择最适合你需要的一种。

赞(0）回复(0）举报 2022-12-09

我来回答

Kibana 如何计算元数据并将其添加到现有的Elasticsearch索引？

1条答案

相关问题

热门标签

最新问答