java—如何使用javardd类通过group by提取计数?

kfgdxczn  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(284)

我想提取 username 以及 count (每个用户执行每个事件的次数)使用javardd类。如何创建javardd对象。
以下是我的数据快照:

{
    "_id" : ObjectId("57b3e6d1cab823158a06cafe"),
    "app" : {
            "clientIp" : "111.0.0.1",
            "event" : {
                    "event_name" : "MAX_SEARCH",
                    "appId" : 1,
                    "userName" : "Alex"
                    }
                }
}

预期结果:

Alex    MAX_SEARCH    5

我该怎么做?

u4dcyp6a

u4dcyp6a1#

假设您在文本文件中有多条记录,如下所示,您希望获得用户名、事件名称和事件计数。

{
  "_id": ObjectId("57b3e6d1cab823158a06cafe"),
  "app": {
    "clientIp": "111.0.0.1",
    "event": {
      "event_name": "MAX_SEARCH",
      "appId": 1,
      "userName": "Alex"
    }
  }
},
{
  "_id": ObjectId("57b3e6d1cab823158a06cafe"),
  "app": {
    "clientIp": "111.0.0.1",
    "event": {
      "event_name": "MAX_SEARCH",
      "appId": 1,
      "userName": "Alex"
    }
  }
}
{
  "_id": ObjectId("57b3e6d1cab823158a01cafe"),
  "app": {`enter code here`
    "clientIp": "111.0.0.1",
    "event": {
      "event_name": "MAX_SEARCH",
      "appId": 1,
      "userName": "Hokam"
    }
  }
},
{
  "_id": ObjectId("57b3e6d1cab823158a02cafe"),
  "app": {
    "clientIp": "111.0.0.1",
    "event": {
      "event_name": "MIN_SEARCH",
      "appId": 1,
      "userName": "Hokam"
    }
  }
}

下面的代码片段帮助您从上面的文件中读取数据,从中创建rdd并生成预期的结果。

import net.minidev.json.JSONObject;
import net.minidev.json.JSONValue;
SparkConf conf = new SparkConf().setAppName("UserEventLogger").setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);

String fileData = FileUtils.readFileToString(new File("/data/pocs/text-file.json"));
List<JSONObject> jsonObject = (List<JSONObject>) JSONValue.parse("[" + fileData + "]");

JavaRDD<JSONObject> jsonRdd = sc.parallelize(jsonObject);

jsonRdd.mapToPair(new PairFunction<JSONObject, String, Integer>() {

    @Override
    public Tuple2<String, Integer> call(JSONObject appObj) throws Exception {
        JSONObject app = (JSONObject) appObj.get("app");
        JSONObject event = ((JSONObject) app.get("event"));
        String username = event.getAsString("userName");
        String eventName = event.getAsString("event_name");

        return new Tuple2<String, Integer>(username + " " + eventName, 1);
    }
}).reduceByKey(new Function2<Integer, Integer, Integer>() {

    @Override
    public Integer call(Integer v1, Integer v2) throws Exception {
        return v1 + v2;
    }
}).foreach(new VoidFunction<Tuple2<String, Integer>>() {

    @Override
    public void call(Tuple2<String, Integer> t) throws Exception {
        System.out.println(t._1 + " " + t._2);

    }
});

sc.stop();

执行上述代码段后,您将看到以下输出:

Hokam MAX_SEARCH 1
Alex MAX_SEARCH 2
Hokam MIN_SEARCH 1

相关问题