regex 捕获复杂字符串列表中的所有字符[已关闭]

wribegjk  于 2023-03-13  发布在  其他
关注(0)|答案(2)|浏览(169)

**已关闭。**此问题为not reproducible or was caused by typos。当前不接受答案。

这个问题是由打字错误或无法再重现的问题引起的。虽然类似的问题在这里可能是on-topic,但这个问题的解决方式不太可能帮助未来的读者。
昨天关门了。
Improve this question
我有一长串复杂的字符串,需要从中提取几个值。我已经能够为其中的大多数创建RegEx,但有一种类型我特别有问题。我已经看了几个类似的问题,但到目前为止没有一个成功。
以下是列表:

malware = ['Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File quarantined|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=313 rt=2022-12-21 08:44:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=66e9f4d4-df39-488d-8cf8-bdcf5d890598.tmp filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File passed|Trojan.Win64.SHELMA.SMB1|3|deviceExternalId=314 rt=2022-12-21 08:45:17 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smith act=File quarantined cn1Label=Pattern cn1=1814300 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File quarantined cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=rev_shell.exe filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ msg=NONAMEFL dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File cleaned|TROJ_GEN.R002C0DKG22|3|deviceExternalId=315 rt=2022-12-21 10:20:31 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1814500 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Real-time Scan cs2Label=Engine cs2=22.580.1004 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=1 fname=aowect.dll filePath=C:\\\\Users\\\\emil\\\\AppData\\\\Local\\\\Temp\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 fileHash=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:Unable to upload file|TSC_GENCLEAN|3|deviceExternalId=316 rt=2022-12-21 13:37:42 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=Non confermato 184296.crdownload filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ', 
'Mar 07 2023 17:15:00 abcd.manage.trendmicro.com CEF:0|Trend Micro|Apex Central|2019|AV:File quarantinedd|Troj.Win32.TRX.XXPE50FFF063|3|deviceExternalId=317 rt=2022-12-21 13:37:49 cnt=1 dhost=NB-SUPPORT TMCMLogDetectedHost=NB-SUPPORT duser=ACME\\\\john.smit act=File cleaned cn1Label=Pattern cn1=1632 cn2Label=Second_Action cn2=1 cs1Label=VLF_FunctionCode cs1=Damage Cleanup Services cs2Label=Engine cs2=7.5.1184 cs3Label=Product_Version cs3=14.0 cs4Label=CLF_ReasonCode cs4=virus log cs5Label=First_Action_Result cs5=File cleaned cs6Label=Second_Action_Result cs6=N/A cat=1703 dvchost=cpnlug.manage.trendmicro.com cn3Label=Overall_Risk_Rating cn3=0 fname=pumpkin-2.7.3.exe filePath=C:\\\\Users\\\\emil\\\\Downloads\\\\ dst=10.18.13.90 TMCMLogDetectedIP=10.18.13.90 deviceFacility=Apex One ApexCentralHost=Apex Central as a Service devicePayloadId=xxx-xxxxx-xxx-xxx TMCMdevicePlatform=Windows 10 10.0 (Build 19044) deviceNtDomain=N/A dntdom=Client\\\\Notebook\\\\ ']

我需要创建两个捕获。第一个是防病毒操作,它是AV:后面的值。例如:

AV:Unable to upload file
AV:Unable to delete file
AV:File quarantined
AV:File passed

我需要:

Unable to upload file
Unable to delete file
File quarantined
File passed

以及可能发生的任何其他动作
另一部分是恶意软件的名称,它紧跟在操作之后,并由管道包围|:

Trojan.Win64.SHELMA.SMB1
TROJ_GEN.R002C0DKG22
Troj.Win32.TRX.XXPE50FFF063
TSC_GENCLEAN

以及可能出现的任何其他恶意软件名称。
这是我其中一次失败的尝试

pattern = '(AV:)(?s)(.*?)\|'
a = [re.search(pattern,m) for m in malware]
print(a)

感谢您抽出时间来帮助我们!

enxuqcxy

enxuqcxy1#

看起来这是您想要的模式,但不确定该模式是否适用于您的所有数据:

pattern = 'AV:(?s)(.*)\|3\|'
var = [re.findall(pattern,m)[0] for m in malware]
var = [v.split('|') for v in var]
print(var)
af7jpaap

af7jpaap2#

我建议将每一行分解成一个可用的结构(eidogg. dictionary),然后使用它进行进一步的处理。
例如

import re

parsed = [ line.split("|") for line in malware ]
for i, (ts,*parts,attributes) in enumerate(parsed):
    *ts,url,cef = ts.split(" ")
    parsed[i] = { "ts":" ".join(ts), "url":url, "CEF":cef.split(":")[-1] }
    parsed[i].update( zip(["brand","vendor","year","message","virus"],parts) )
    iAttr = iter(re.split(r'(\b\w+\=)',attributes)[1:])
    parsed[i].update( (k[:-1],v.strip()) for k,v in zip(iAttr,iAttr) )
  • 看起来这些行是由不同来源的组件使用不同的分隔符组成的(管道在最后一级使用)。您可以根据需要进一步分解每个部分。*

输出:

import json
for i,d in enumerate(parsed):
    print("at index",i)
    print(json.dumps(d,indent=4))

在索引0处

{
    "ts": "Mar 07 2023 17:15:00",
    "url": "abcd.manage.trendmicro.com",
    "CEF": "0",
    "brand": "Trend Micro",
    "vendor": "Apex Central",
    "year": "2019",
    "message": "AV:File quarantined",
    "virus": "Trojan.Win64.SHELMA.SMB1",
    "deviceExternalId": "313",
    "rt": "2022-12-21 08:44:17",
    "cnt": "1",
    "dhost": "NB-SUPPORT",
    "TMCMLogDetectedHost": "NB-SUPPORT",
    "duser": "ACME\\\\john.smith",
    "act": "File quarantined",
    "cn1Label": "Pattern",
    "cn1": "1814300",
    "cn2Label": "Second_Action",
    "cn2": "1",
    "cs1Label": "VLF_FunctionCode",
    "cs1": "Real-time Scan",
    "cs2Label": "Engine",
    "cs2": "22.580.1004",
    "cs3Label": "Product_Version",
    "cs3": "14.0",
    "cs4Label": "CLF_ReasonCode",
    "cs4": "virus log",
    "cs5Label": "First_Action_Result",
    "cs5": "File quarantined",
    "cs6Label": "Second_Action_Result",
    "cs6": "N/A",
    "cat": "1703",
    "dvchost": "cpnlug.manage.trendmicro.com",
    "cn3Label": "Overall_Risk_Rating",
    "cn3": "0",
    "fname": "66e9f4d4-df39-488d-8cf8-bdcf5d890598.tmp",
    "filePath": "C:\\\\Users\\\\emil\\\\Downloads\\\\",
    "msg": "NONAMEFL",
    "dst": "10.18.13.90",
    "TMCMLogDetectedIP": "10.18.13.90",
    "fileHash": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "deviceFacility": "Apex One",
    "ApexCentralHost": "Apex Central as a Service",
    "devicePayloadId": "xxx-xxxxx-xxx-xxx",
    "TMCMdevicePlatform": "Windows 10 10.0 (Build 19044)",
    "deviceNtDomain": "N/A",
    "dntdom": "Client\\\\Notebook\\\\"
}

在索引1处

{
    "ts": "Mar 07 2023 17:15:00",
    "url": "abcd.manage.trendmicro.com",
    "CEF": "0",
    "brand": "Trend Micro",
    "vendor": "Apex Central",
    "year": "2019",
    "message": "AV:File passed",
    "virus": "Trojan.Win64.SHELMA.SMB1",
    "deviceExternalId": "314",
    "rt": "2022-12-21 08:45:17",
    "cnt": "1",
    "dhost": "NB-SUPPORT",
    "TMCMLogDetectedHost": "NB-SUPPORT",
    "duser": "ACME\\\\john.smith",
    "act": "File quarantined",
    "cn1Label": "Pattern",
    "cn1": "1814300",
    "cn2Label": "Second_Action",
    "cn2": "1",
    "cs1Label": "VLF_FunctionCode",
    "cs1": "Real-time Scan",
    "cs2Label": "Engine",
    "cs2": "22.580.1004",
    "cs3Label": "Product_Version",
    "cs3": "14.0",
    "cs4Label": "CLF_ReasonCode",
    "cs4": "virus log",
    "cs5Label": "First_Action_Result",
    "cs5": "File quarantined",
    "cs6Label": "Second_Action_Result",
    "cs6": "N/A",
    "cat": "1703",
    "dvchost": "cpnlug.manage.trendmicro.com",
    "cn3Label": "Overall_Risk_Rating",
    "cn3": "0",
    "fname": "rev_shell.exe",
    "filePath": "C:\\\\Users\\\\emil\\\\Downloads\\\\",
    "msg": "NONAMEFL",
    "dst": "10.18.13.90",
    "TMCMLogDetectedIP": "10.18.13.90",
    "fileHash": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "deviceFacility": "Apex One",
    "ApexCentralHost": "Apex Central as a Service",
    "devicePayloadId": "xxx-xxxxx-xxx-xxx",
    "TMCMdevicePlatform": "Windows 10 10.0 (Build 19044)",
    "deviceNtDomain": "N/A",
    "dntdom": "Client\\\\Notebook\\\\"
}

...

相关问题