elasticsearch Histogram field type 使用及注意事项

x33g5p2x  于2022-02-20 转载在 ElasticSearch  
字(5.0k)|赞(0)|评价(0)|浏览(697)

Histogram

先附上文档链接: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/histogram.html

当在网络上搜索 elasticsearch Histogram 时,会有两个结果:

  • type Histogram
  • aggregation Histogram

但是 对于 aggregation 的结果会比较多,而 type 的却很少,那么,本篇博文主要记录 type Histogram 的使用以及注意事项。ps(本篇博文还有一些未理解的点待调研,因此,本篇博文会不断更新)

Histogram field type

Histogram 是由两个成对数组定义的类型。
它有以下注意事项:

  • values 存储类型为 double 而且必须升序
  • counts 必须是 integet 必须是正整数或者0
  • 这两个数组的长度是一致的,这是因为他们的值一 一 对应
  • 并且不支持 嵌套数组,以及排序。

Histogram 存储的数据为二进制文档,而不是索引,这样可以更快速的聚合,它的字节大小最多为 13*数组的长度。

Quick start

添加 mapping

  1. PUT histogram_test
  2. {
  3. "mappings" : {
  4. "properties" : {
  5. "my_histogram" : {
  6. "type" : "histogram"
  7. },
  8. "my_text" : {
  9. "type" : "keyword"
  10. }
  11. }
  12. }
  13. }

添加数据

  1. PUT histogram_test/_doc/1
  2. {
  3. "my_text" : "histogram_1",
  4. "my_histogram" : {
  5. "values" : [0.1, 0.2, 0.3, 0.4, 0.5],
  6. "counts" : [3, 7, 23, 12, 6]
  7. }
  8. }
  9. PUT histogram_test/_doc/2
  10. {
  11. "my_text" : "histogram_2",
  12. "my_histogram" : {
  13. "values" : [0.1, 0.2, 0.3, 0.4, 1],
  14. "counts" : [3, 7, 23, 12, 6]
  15. }
  16. }

Error example

错误示范: 添加 values 不是递增的字段

  1. PUT histogram_test/_doc/1
  2. {
  3. "my_text" : "histogram_1",
  4. "my_histogram" : {
  5. "values" : [0.1, 0.2, 0.1, 0.4, 0.5],
  6. "counts" : [3, 7, 23, 12, 6]
  7. }
  8. }
  9. ***********result**************
  10. {
  11. "error" : {
  12. "root_cause" : [
  13. {
  14. "type" : "mapper_parsing_exception",
  15. "reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
  16. }
  17. ],
  18. "type" : "mapper_parsing_exception",
  19. "reason" : "failed to parse field [my_histogram] of type [histogram]",
  20. "caused_by" : {
  21. "type" : "mapper_parsing_exception",
  22. "reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
  23. }
  24. },
  25. "status" : 400
  26. }

错误示范:counts 的数值小于0

  1. PUT histogram_test/_doc/3
  2. {
  3. "my_text" : "histogram_3",
  4. "my_histogram" : {
  5. "values" : [0.1, 0.2, 0.3, 0.4, 1],
  6. "counts" : [3, 7, 23, 12, -6]
  7. }
  8. }
  9. ***********result**************
  10. {
  11. "error" : {
  12. "root_cause" : [
  13. {
  14. "type" : "mapper_parsing_exception",
  15. "reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
  16. }
  17. ],
  18. "type" : "mapper_parsing_exception",
  19. "reason" : "failed to parse field [my_histogram] of type [histogram]",
  20. "caused_by" : {
  21. "type" : "mapper_parsing_exception",
  22. "reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
  23. }
  24. },
  25. "status" : 400
  26. }

Aggregation

  • min aggregation
  • max aggregation
  • sum aggregation
  • value_count aggregation
  • avg aggregation
  • percentiles aggregation (ps 还没搞懂,待调研)
  • percentile ranks aggregation (ps 还没搞懂,待调研)
  • boxplot aggregation (ps 还没搞懂,待调研)
  • histogram aggregation
  • range aggregation (ps 还没搞懂,待调研)
min aggregation

将 values 中 最小的值返回

  1. GET /histogram_test/_search
  2. {
  3. "aggs": {
  4. "min_latency": {
  5. "min": {
  6. "field": "my_histogram"
  7. }
  8. }
  9. }
  10. }
  11. **********************value********************
  12. "aggregations" : {
  13. "min_latency" : {
  14. "value" : 0.1
  15. }
  16. }
max

将 values 中 最大的值返回

  1. GET /histogram_test/_search
  2. {
  3. "aggs": {
  4. "max_histogram": {
  5. "max": {
  6. "field": "my_histogram"
  7. }
  8. }
  9. }
  10. }
  11. **********************value********************
  12. "aggregations" : {
  13. "max_histogram" : {
  14. "value" : 1.0
  15. }
  16. }
sum

将 values 和 counts 的一一对应的值进行相乘,最后在一起相加。

  1. GET /histogram_test/_search
  2. {
  3. "aggs": {
  4. "sum_histogram": {
  5. "sum": {
  6. "field": "my_histogram"
  7. }
  8. }
  9. }
  10. }
  11. **********************value********************
  12. "aggregations" : {
  13. "sum_histogram" : {
  14. "value" : 35.8
  15. }
  16. }
value_count

对所有 counts 的值进行相加。

  1. GET /histogram_test/_search
  2. {
  3. "aggs": {
  4. "count_histogram": {
  5. "value_count": {
  6. "field": "my_histogram"
  7. }
  8. }
  9. }
  10. }
  11. **********************value********************
  12. "aggregations" : {
  13. "count_histogram" : {
  14. "value" : 102
  15. }
  16. }
avg

将值数组 values 中的每个数字乘以其在计数数组 counts 中的关联计数。最终,它将计算所有直方图的这些值的平均值,可以理解成 sum / count.

  1. GET /histogram_test/_search
  2. {
  3. "aggs": {
  4. "avg_histogram": {
  5. "avg": {
  6. "field": "my_histogram"
  7. }
  8. }
  9. }
  10. }
  11. **********************value********************
  12. "aggregations" : {
  13. "avg_histogram" : {
  14. "value" : 0.3509803921568627
  15. }
  16. }
histogram aggregation

根据 values 计算出每个区间的数量。
interval 区间的间隔数。

  1. GET /histogram_test/_search
  2. {
  3. "aggs": {
  4. "histogram_histogram": {
  5. "histogram": {
  6. "field": "my_histogram",
  7. "interval": 0.5
  8. }
  9. }
  10. }
  11. }
  12. **********************value********************
  13. "aggregations" : {
  14. "histogram_histogram" : {
  15. "buckets" : [
  16. {
  17. "key" : 0.0,
  18. "doc_count" : 90
  19. },
  20. {
  21. "key" : 0.5,
  22. "doc_count" : 6
  23. },
  24. {
  25. "key" : 1.0,
  26. "doc_count" : 6
  27. }
  28. ]
  29. }
  30. }

Query

只有指定的查询才可用。

exists query
  1. GET /histogram_test/_search
  2. {
  3. "query": {
  4. "exists": {
  5. "field": "my_histogram"
  6. }
  7. }
  8. }

END

博文中的待调研的部分,博主会在后续的时间里进行补齐,欢迎小伙伴们多多交流。

相关文章

最新文章

更多