我对springintegrationdsl完全陌生,我要做的是从mongo获取所有rss提要,将它们全部注册到flow上下文中,并处理从提要获取的所有文章。目前,我有一个for each循环,用于调用包含以下内容的函数:
IntegrationFlow newFlow = IntegrationFlows
.from(Feed.inboundAdapter(new URL(url), source), e -> e.id(org + "-" + source + "-feed").poller(Pollers.fixedDelay(poll).maxMessagesPerPoll(maxMessages).errorChannel("feedErrors")))
.enrichHeaders(h -> h.header("sourceId", sourceId))
.enrichHeaders(h -> h.header("sourceName", sourceName))
.enrichHeaders(h -> h.header("source", source))
.enrichHeaders(h -> h.header("categories", categories))
.enrichHeaders(h -> h.header("org", org))
.channel(MessageChannels.executor(taskExecutor))
.handle("enricher", "enhance")
.channel(news())
.get();
this.flowContext.registration(newFlow).id(org + "-" + source + "-flow").register();
它做得如此出色,除了例外,有趣的是。当提要当前不可用、已重命名或提要本身的文章格式不正确时,将引发异常,并转到流定义中指示的“feederrors”通道。该部分如下所示:
@Component
public class RssFeedErrorHandler {
private final Logger logger = LoggerFactory.getLogger(RssFeedErrorHandler.class);
@Bean
public IntegrationFlow errorSender() {
return IntegrationFlows.from("feedErrors")
.handle("rssFeedErrorHandler", "handleError")
.get();
}
public void handleError(Message<MessagingException> message) {
logger.error(message.toString());
}
}
例外示例:
AdviceMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRFZ4ZERBU0FtVnVLQUFQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen, feedResource=null, metadataKey='google-news-politics.https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRFZ4ZERBU0FtVnVLQUFQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen', lastTime=1613665200000}'; nested exception is java.io.FileNotFoundException: https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRFZ4ZERBU0FtVnVLQUFQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen was successful, headers={id=101f3da9-42de-eb03-0da2-015952fc0a23, timestamp=1613667428995}, inputMessage=ErrorMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRFZ4ZERBU0FtVnVLQUFQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen, feedResource=null, metadataKey='google-news-politics.https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRFZ4ZERBU0FtVnVLQUFQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen', lastTime=1613665200000}'; nested exception is java.io.FileNotFoundException: https://news.google.com/rss/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRFZ4ZERBU0FtVnVLQUFQAQ?hl=en-GB&gl=GB&ceid=GB%3Aen, headers={id=8a470a5f-4027-371c-44fd-1250703077b0, timestamp=1613667428995}]]
AdviceMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://www.startupbootcamp.org/feed/, feedResource=null, metadataKey='startupbootcamp-startups.https://www.startupbootcamp.org/feed/', lastTime=1613015630000}'; nested exception is java.net.ConnectException: Operation timed out (Connection timed out) was successful, headers={id=6b5a1a00-905a-68b6-128b-4cdd8eb06737, timestamp=1613664837150}, inputMessage=ErrorMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://www.startupbootcamp.org/feed/, feedResource=null, metadataKey='startupbootcamp-startups.https://www.startupbootcamp.org/feed/', lastTime=1613015630000}'; nested exception is java.net.ConnectException: Operation timed out (Connection timed out), headers={id=70ef8ae7-1b75-9e38-e631-c88777741609, timestamp=1613664837150}]]
AdviceMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://timesofindia.indiatimes.com/rssfeeds/1081479906.cms, feedResource=null, metadataKey='india-times-entertainment.https://timesofindia.indiatimes.com/rssfeeds/1081479906.cms', lastTime=1613715922000}'; nested exception is com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: Character reference "�" is an invalid XML character. was successful, headers={id=fb936878-6450-d79a-851a-ca1ceeef8182, timestamp=1613705658735}, inputMessage=ErrorMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://timesofindia.indiatimes.com/rssfeeds/1081479906.cms, feedResource=null, metadataKey='india-times-entertainment.https://timesofindia.indiatimes.com/rssfeeds/1081479906.cms', lastTime=1613715922000}'; nested exception is com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: Character reference "�" is an invalid XML character., headers={id=24f2a11a-80a6-6c13-dbe1-9548747174aa, timestamp=1613705658734}]]
AdviceMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://www.jeffbullas.com/feed/, feedResource=null, metadataKey='jeff-bullas-marketing.https://www.jeffbullas.com/feed/', lastTime=1613574000000}'; nested exception is java.io.IOException: Server returned HTTP response code: 500 for URL: https://www.jeffbullas.com/feed/ was successful, headers={id=e6998e35-afdc-b476-c2d9-2f4b0cc46c6b, timestamp=1613710313323}, inputMessage=ErrorMessage [payload=org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://www.jeffbullas.com/feed/, feedResource=null, metadataKey='jeff-bullas-marketing.https://www.jeffbullas.com/feed/', lastTime=1613574000000}'; nested exception is java.io.IOException: Server returned HTTP response code: 500 for URL: https://www.jeffbullas.com/feed/, headers={id=420f9b5d-1c58-cd0a-c4e2-5d640578f1c9, timestamp=1613710313322}]]
这里有两个完整的堆栈:
org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=https://timesofindia.indiatimes.com/rssfeeds/1081479906.cms, feedResource=null, metadataKey='india-times-entertainment.https://timesofindia.indiatimes.com/rssfeeds/1081479906.cms', lastTime=-1}'; nested exception is com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: Character reference "�" is an invalid XML character.
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.getFeed(FeedEntryMessageSource.java:239)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.populateEntryList(FeedEntryMessageSource.java:202)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.doReceive(FeedEntryMessageSource.java:177)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.doReceive(FeedEntryMessageSource.java:58)
at org.springframework.integration.endpoint.AbstractMessageSource.receive(AbstractMessageSource.java:167)
at org.springframework.integration.endpoint.SourcePollingChannelAdapter.receiveMessage(SourcePollingChannelAdapter.java:250)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:359)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.pollForMessage(AbstractPollingEndpoint.java:328)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$null$1(AbstractPollingEndpoint.java:275)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.lambda$execute$0(ErrorHandlingTaskExecutor.java:57)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.execute(ErrorHandlingTaskExecutor.java:55)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$createPoller$2(AbstractPollingEndpoint.java:272)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: Character reference "�" is an invalid XML character.
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236)
at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.getFeed(FeedEntryMessageSource.java:226)
... 20 more
Caused by: org.jdom2.input.JDOMParseException: Error on line 1: Character reference "�" is an invalid XML character.
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233)
... 22 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 15681; Character reference "�" is an invalid XML character.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanCharReferenceValue(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCharReference(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217)
... 25 more
org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=http://www.independent.co.uk/news/uk/politics/rss, feedResource=null, metadataKey='independent-politics.http://www.independent.co.uk/news/uk/politics/rss', lastTime=1613726122000}'; nested exception is com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.getFeed(FeedEntryMessageSource.java:239)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.populateEntryList(FeedEntryMessageSource.java:202)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.doReceive(FeedEntryMessageSource.java:177)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.doReceive(FeedEntryMessageSource.java:58)
at org.springframework.integration.endpoint.AbstractMessageSource.receive(AbstractMessageSource.java:167)
at org.springframework.integration.endpoint.SourcePollingChannelAdapter.receiveMessage(SourcePollingChannelAdapter.java:250)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:359)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.pollForMessage(AbstractPollingEndpoint.java:328)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$null$1(AbstractPollingEndpoint.java:275)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.lambda$execute$0(ErrorHandlingTaskExecutor.java:57)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.execute(ErrorHandlingTaskExecutor.java:55)
at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$createPoller$2(AbstractPollingEndpoint.java:272)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 1: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:236)
at com.rometools.rome.io.SyndFeedInput.build(SyndFeedInput.java:150)
at org.springframework.integration.feed.inbound.FeedEntryMessageSource.getFeed(FeedEntryMessageSource.java:226)
... 20 more
Caused by: org.jdom2.input.JDOMParseException: Error on line 1: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
at com.rometools.rome.io.WireFeedInput.build(WireFeedInput.java:233)
... 22 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:217)
... 25 more
所以错误被抛出,它被记录,生命继续,除了一些原因,轮询获取似乎被保存以供重试。如果它重试一次,这就不是问题了,但它只是不断地重试,当错误堆积得足够多时,它只会不断地重试这些错误,而不会获取任何新的内容。
例如,500个feed被轮询,其中10个抛出一个异常,其中feed已经被移动而不再存在,它有坏的xml或者可能发生了其他事情。现在,轮询器只能一遍又一遍地轮询这10个提要,直到不再失败(这永远不会发生)。我记得在某个地方读到,如果抛出异常 handleError
函数则会继续重试,但如果没有抛出异常,则应该继续。这里似乎不是这样。我已经花了3天的时间来考虑这个问题,尝试不同的解决方案,但它总是在大约4-10分钟后窒息,这取决于需要多长时间才能获取足够的错误提要。
1条答案
按热度按时间c9qzyr3d1#
我想说的是,您尝试谈论许多不同的异常:不可用、格式错误、处理错误等。我认为最佳做法是区分这些异常,并为每个异常做出适当的业务决策:我们绝对不能将“不可用”视为处理错误,必须重试,直到它恢复可用。
对于处理一个,我建议调查一个
ExpressionEvaluatingRequestHandlerAdvice
在服务上,您的流程输入的提要:https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#message-处理建议链对于格式错误的源代码,我建议从上下文中查看流删除。所以,你再也不会试图调查一个错误的来源了。你绝对可以从错误流中找到答案
feedErrors
频道。在这种情况下,你所面临的其他错误也是如此。
我们可能会修改你的建议
10 are of those throw an exception where the feeds have been removed for whatever reason
更详细地说FeedEntryMessageSource
是基于updatedDate
条目的名称。所以,如果我们不能正确地转换和生产,它真的会再次被拉,因为lastTime
状态未适当更新。但是,让我们在单独的so线程中使用更多的细节进行调查!更新
关于堆栈跟踪的一些想法。
例外情况如下
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 15681; Character refer
是不好的,必须视为致命的。你肯定不能再解析rss源了,至少在它恢复正常之前。这种异常可以像stop()
为了这个Feed.inboundAdapter()
. 所以,在feedErrors
通道流并调用stop()
导致终结点的。或者最好删除这个rss源的整个动态流。例外
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.
也是不好的,可能会被同样致命的方式与后续治疗stop()
对于端点。或者你可以调查xerces
库如何允许该dtd声明。可能是这样的选择syndFeedInput(SyndFeedInput)
还可以为xml解析提供一些钩子。例如,我看到它有以下选项: