elasticsearch将数据转换为数组

643ylb08  于 2021-06-14  发布在  ElasticSearch
关注(0)|答案(1)|浏览(429)

我想使用es计算用户保留率:
1、事件日志到默认索引
2、向中间索引转换:以实体为中心的数据,按acc分组
3、使用aggs过滤器(或邻接矩阵)计算每天的相交结果。
问题在第二步:如何生成一个好的转换
输入事件日志:

POST _bulk
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"create", "timestamp":"2020-08-01 09:00"}
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"login", "timestamp":"2020-08-01 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"login", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1001, "event":"login", "timestamp":"2020-08-03 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1002, "event":"create", "timestamp":"2020-08-01 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1002, "event":"login", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1002, "event":"login", "timestamp":"2020-08-02 11:00"}
{"index": {"_index": "test.u1"}}
{"acc":1003, "event":"create", "timestamp":"2020-08-01 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1004, "event":"create", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1004, "event":"login", "timestamp":"2020-08-02 10:00"}
{"index": {"_index": "test.u1"}}
{"acc":1004, "event":"login", "timestamp":"2020-08-03 10:00"}

期望中间索引:

{"acc":1001, "create":"08-01", "login":[08-01, 08-02, 08-03]}
{"acc":1002, "create":"08-01", "login":[08-02]}
{"acc":1003, "create":"08-01", "login":[]}
{"acc":1004, "create":"08-02", "login":[08-02, 08-03]}

如何生成“login”数组?或者欢迎任何更好的设计。

bksxznpy

bksxznpy1#

通过aggs.scripted\u metric完成

PUT _transform/tr-acc2-ar2
{
  "source": {
    "index": [
      "mhlog2-*"
    ]
  },
  "pivot": {
    "group_by": {
      "msg.#account_id": {
        "histogram": {
          "field": "msg.#account_id",
          "interval": "1"
        }
      }
    },
    "aggregations": {
      "create": {
        "filter": {
          "term": {
            "msg.#event_name.keyword": "createRole"
          }
        },
        "aggs": {
          "time": {
            "min": {
              "field": "@timestamp"
            }
          }
        }
      },
      "login": {
        "filter": {
          "term": {
            "msg.#event_name.keyword": "login"
          }
        },
        "aggs": {
          "days": {
            "scripted_metric": {
              "init_script": "state.days=[:];",
              "map_script": "state.days[doc['@timestamp'].value.toString('yyyy-MM-dd')]=1; ",
              "combine_script": "return state",
              "reduce_script": "def days = [:]; def array =[]; for (s in states) { for (d in s.days.keySet()) { days[d]=1; } }  for (d in days.keySet()) { array.add(d);} return array; "
            }
          }
        }
      }
    }
  },
  "dest": {
    "index": "idx.tr.acc2.ar2"
  },
  "sync": {
    "time": {
      "field": "@timestamp",
      "delay": "60s"
    }
  }
}

中间指标:

_id : AAAAAAAA
_index : acc.array  
_score : 0
_type : _doc    
create.time : Aug 18, 2020 @ 11:17:43.000   
login.days : 2020-08-18T00:00:00.000Z, 2020-08-19T00:00:00.000Z, 2020-08-20T00:00:00.000Z   
msg.#account_id : 12333212323

最后,通过kql过滤器,2020-08-18 2020-08-19的用户保留很容易:

create.time: 2020-08-18 AND login.days: 2020-08-19

相关问题