regex serde读取配置单元中的日志文件

ctrmrzij  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(294)

我试图在配置单元中创建一个regex serde来读取一些日志文件,但是在让它工作时遇到了问题。。。
日志文件如下所示。。。

14.196.202.16:9123  11329   2016-01-27 17:50:26.965 -5                  Thread-14960    CCS 6104    1   Audit.rds.CCS       reportDataService       Failure <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>   <trace>ClientAbortException:  java.net.SocketException: Broken pipe     at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)  at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)     at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)  at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)   at java.io.BufferedOutputStream.write(Unknown Source)   at java.io.BufferedOutputStream.write(Unknown Source)   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)  at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)   at sun.nio.cs.StreamEncoder.write(Unknown Source)   at java.io.OutputStreamWriter.write(Unknown Source)     at java.io.BufferedWriter.flushBuffer(Unknown Source)   at java.io.BufferedWriter.write(Unknown Source)     at java.io.Writer.write(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source)  at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source)   at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source)  at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source)   at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source)     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)  at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe   at java.net.SocketOutputStream.socketWrite0(Native Method)  at java.net.SocketOutputStream.socketWrite(Unknown Source)  at java.net.SocketOutputStream.write(Unknown Source)    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761)  at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)  at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785)    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)   at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598)     at org.apache.coyote.Response.doWrite(Response.java:533)    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)     ... 35 more </trace>

我走了这么远:

([^ ]*)\t(-|[0-9]*)\t

把这个拿回来:

Match 1
1.  14.196.202.16:9123
2.  11329

这给了我前两个正确的答案…但是当我把日期加进去的时候:

([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t

我把这个拿回来:

Match 1
1.  17:50:26.965    -5                    Thread-14960    CCS    6104    1    Audit.rds.CCS        reportDataService
2.   
3.  Failure

我对regex非常陌生,正在尝试解决这个问题,但遇到了麻烦…我也在尝试使用这个网站:
http://rubular.com/
基本上我想让它看起来像这样:

1. 14.196.202.16:9123   
2. 11329    
3. 2016-01-27 17:50:26.965 -5
4. 
5. 
6. 
7. 
8. Thread-14960 
9. CCS  
10. 6104    
11. 1   
12. Audit.rds.CCS   
13. 
14. reportDataService   
15. 
16. Failure 
17. <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>   
19. <trace>ClientAbortException:  java.net.SocketException: Broken pipe     at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339)  at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392)     at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381)  at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89)   at java.io.BufferedOutputStream.write(Unknown Source)   at java.io.BufferedOutputStream.write(Unknown Source)   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)  at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)   at sun.nio.cs.StreamEncoder.write(Unknown Source)   at java.io.OutputStreamWriter.write(Unknown Source)     at java.io.BufferedWriter.flushBuffer(Unknown Source)   at java.io.BufferedWriter.write(Unknown Source)     at java.io.Writer.write(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source)  at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source)   at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source)     at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source)  at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source)     at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source)   at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source)     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source)  at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe   at java.net.SocketOutputStream.socketWrite0(Native Method)  at java.net.SocketOutputStream.socketWrite(Unknown Source)  at java.net.SocketOutputStream.write(Unknown Source)    at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761)  at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448)     at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363)  at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785)    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124)   at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598)     at org.apache.coyote.Response.doWrite(Response.java:533)    at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364)     ... 35 more </trace>

编辑:
所以我想我走对了方向:
我现在有了这个:

([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t(\w+|\S+|\s+)\t(\w+|.)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t

但我还是拿不到 <message> 以及 <trace> 分组。

2wnc66cl

2wnc66cl1#

我让正则表达式工作…这是我最后的结果

([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)

相关问题