我使用spark流处理多个csv文件。文件验证经常由于下游系统而改变。为此,我们决定探索apache drools。在使用drools之前,大文件在几秒钟内就完成了进程。在drool中添加了一些规则之后,性能显著下降。对于一个5mb的文件系统,它需要3-4分钟来处理。我对spark和drools还不熟悉,想了解如何在java中同时使用这两种语言。我遇到了多个答案,但不明白如何使用它。
下面是代码:
@Configuration
@Slf4j
public class DroolConfig {
private KieServices kieServices = KieServices.Factory.get();
private KieFileSystem getKieFileSystem() throws IOException {
KieFileSystem kieFileSystem = kieServices.newKieFileSystem();
kieFileSystem.write(ResourceFactory.newClassPathResource("drlrules/rules.drl"));
return kieFileSystem;
}
@Bean
public KieContainer getKieContainer() throws IOException {
log.info("Container created...");
getKieRepository();
KieBuilder kb = kieServices.newKieBuilder(getKieFileSystem());
kb.buildAll();
KieModule kieModule = kb.getKieModule();
KieContainer kContainer = kieServices.newKieContainer(kieModule.getReleaseId());
return kContainer;
}
private void getKieRepository() {
final KieRepository kieRepository = kieServices.getRepository();
kieRepository.addKieModule(new KieModule() {
public ReleaseId getReleaseId() {
return kieRepository.getDefaultReleaseId();
}
});
}
@Bean
public KieSession getKieSession() throws IOException {
log.info("session created...");
return getKieContainer().newKieSession();
}
验证文件:
@Autowired
private KieSession session;
.....
javaRDD1.collect().forEach(col -> {
if(idx[0] > 0) {
clientModel.setCsvLine(Arrays.asList(col));
clientModel.setFileName(file.getName());
FactHandle handle = session.insert(clientModel);
session.fireAllRules();
session.delete(handle);
}
idx[0]++;
});
暂无答案!
目前还没有任何答案,快来回答吧!