先附上文档链接: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/histogram.html
当在网络上搜索 elasticsearch Histogram 时,会有两个结果:
但是 对于 aggregation 的结果会比较多,而 type 的却很少,那么,本篇博文主要记录 type Histogram 的使用以及注意事项。ps(本篇博文还有一些未理解的点待调研,因此,本篇博文会不断更新)
Histogram 是由两个成对数组定义的类型。
它有以下注意事项:
Histogram 存储的数据为二进制文档,而不是索引,这样可以更快速的聚合,它的字节大小最多为 13*数组的长度。
添加 mapping
PUT histogram_test
{
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
},
"my_text" : {
"type" : "keyword"
}
}
}
}
添加数据
PUT histogram_test/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
}
}
PUT histogram_test/_doc/2
{
"my_text" : "histogram_2",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 1],
"counts" : [3, 7, 23, 12, 6]
}
}
错误示范: 添加 values 不是递增的字段
PUT histogram_test/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.1, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
}
}
***********result**************
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [my_histogram] of type [histogram]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
}
},
"status" : 400
}
错误示范:counts 的数值小于0
PUT histogram_test/_doc/3
{
"my_text" : "histogram_3",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 1],
"counts" : [3, 7, 23, 12, -6]
}
}
***********result**************
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [my_histogram] of type [histogram]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
}
},
"status" : 400
}
将 values 中 最小的值返回
GET /histogram_test/_search
{
"aggs": {
"min_latency": {
"min": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"min_latency" : {
"value" : 0.1
}
}
将 values 中 最大的值返回
GET /histogram_test/_search
{
"aggs": {
"max_histogram": {
"max": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"max_histogram" : {
"value" : 1.0
}
}
将 values 和 counts 的一一对应的值进行相乘,最后在一起相加。
GET /histogram_test/_search
{
"aggs": {
"sum_histogram": {
"sum": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"sum_histogram" : {
"value" : 35.8
}
}
对所有 counts 的值进行相加。
GET /histogram_test/_search
{
"aggs": {
"count_histogram": {
"value_count": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"count_histogram" : {
"value" : 102
}
}
将值数组 values 中的每个数字乘以其在计数数组 counts 中的关联计数。最终,它将计算所有直方图的这些值的平均值,可以理解成 sum / count.
GET /histogram_test/_search
{
"aggs": {
"avg_histogram": {
"avg": {
"field": "my_histogram"
}
}
}
}
**********************value********************
"aggregations" : {
"avg_histogram" : {
"value" : 0.3509803921568627
}
}
根据 values 计算出每个区间的数量。
interval 区间的间隔数。
GET /histogram_test/_search
{
"aggs": {
"histogram_histogram": {
"histogram": {
"field": "my_histogram",
"interval": 0.5
}
}
}
}
**********************value********************
"aggregations" : {
"histogram_histogram" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 90
},
{
"key" : 0.5,
"doc_count" : 6
},
{
"key" : 1.0,
"doc_count" : 6
}
]
}
}
只有指定的查询才可用。
GET /histogram_test/_search
{
"query": {
"exists": {
"field": "my_histogram"
}
}
}
博文中的待调研的部分,博主会在后续的时间里进行补齐,欢迎小伙伴们多多交流。
版权说明 : 本文为转载文章, 版权归原作者所有 版权申明
原文链接 : https://blog.csdn.net/qq_29064815/article/details/123027315
内容来源于网络,如有侵权,请联系作者删除!