elasticsearch 从Spring Boot发布大型数据的最佳方式

kqqjbcuj  于 2022-11-02  发布在  ElasticSearch
关注(0)|答案(1)|浏览(270)

我们有弹性数据库,其中有员工的任务细节,我们正在发布员工明智的任务,Kafka每天使用 Spring 启动应用程序。
弹性数据库索引:员工_任务

{
    "employeeId":"E001",
    "taskName":"task1",
    "taskDesc":"task desc",
    "startDate":"2022-10-10 11:00:00",
    "endDate":"2022-10-10 16:00:00"
    }
    {
    "employeeId":"E001",
    "taskName":"task2",
    "taskDesc":"task desc",
    "startDate":"2022-10-10 16:00:00",
    "endDate":"2022-10-10 18:02:00"
    }
    {
    "employeeId":"E002",
    "taskName":"task3",
    "taskDesc":"task desc",
    "startDate":"2022-10-10 09:00:00",
    "endDate":"2022-10-10 18:00:00"
    }

Spring Boot代码:

@Scheduled(cron = "${cron.task.expression}")
    public void scheduleTasks() {
                //Get District Employee Ids from index employee_task
                List<String> employees = taskService.getAllEmployeeIds();   
                //fetch tasks from index employee_task for each employee and publish to Kafka      
                employees.parallelStream().forEach(employeeId -> {
                    Map<String, Object> tasksList = taskService.getAllTasksByEmployeeId(employeeId);
                    kafkaTemplate.send(topicName, mapper.writeValueAsString(tasksList));
                });
    }

它将每天以下面的格式向Kafka发布任务细节,

Message.1
{
"employeeId":"E001",
"taskList":[
    {
    "employeeId":"E001",
    "taskName":"task1",
    "taskDesc":"task desc",
    "startDate":"2022-10-10 11:00:00",
    "endDate":"2022-10-10 16:00:00"
    }
    {
    "employeeId":"E001",
    "taskName":"task2",
    "taskDesc":"task desc",
    "startDate":"2022-10-10 16:00:00",
    "endDate":"2022-10-10 18:02:00"
    }
]
}
Message.2
{
"employeeId":"E002",
"taskList":[
    {
    "employeeId":"E002",
    "taskName":"task3",
    "taskDesc":"task desc",
    "startDate":"2022-10-10 09:00:00",
    "endDate":"2022-10-10 18:00:00"
    }
]
}

直到现在一切都很好因为数据很低但是现在,

Current No. of employees: 10,000
Average Task per Employee: 100

因此,当cron运行时,它会查询弹性数据库10K次。有人能建议处理这种情况的最佳方法吗?

mxg2im7a

mxg2im7a1#

数据库世界中的一句老话是“逐行等于逐慢”。根据问题中提供的内容,我猜服务会以某种循环的方式调用数据库。需要编写一些方法才能一次从数据库中获取数据。

相关问题