使用Python将nginx.log转换为CSV

rur96b6h  于 2022-09-20  发布在  Python
关注(0)|答案(1)|浏览(279)

我有nginx.log文件的一部分:

192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] "GET /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30 HTTP/2.0" 200 212 "https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" 134 0.006 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 212 0.008 200 6bc328f046dcd1df823aa920397fb346
192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] "GET /api/datasources/proxy/1/api/v1/query_range?query=probe_success%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0" 200 201 "https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" 116 0.007 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 201 0.008 200 c10141117983e888db68f2e1ff223575
192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] "GET /api/datasources/proxy/1/api/v1/query_range?query=probe_http_ssl%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0" 200 204 "https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36" 117 0.007 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 204 0.008 200 60724ca6531bc640649bac50bbc04a7e

我需要通过Python将此nginx.log文件转换为CSV文件,我应该如何做,或者我应该使用什么RegEx进行转换

vlf7wbxs

vlf7wbxs1#

您可以使用下面的代码作为您想要的内容的基础。您基本上需要进行一些定制的行拆分,以获得您想要的元素。请注意,用户代理是首先拆分报价字符的原因,因为这是唯一可能具有不可预测的空格数量的元素(AFAIK)。

我添加了一个简单的助手函数来显示元素的编号,并且展示了几种不同的拆分方法。变量名称可能需要更改,因为我不能100%确定您在nginx中登录的确切内容是什么……

def splitline(line: str) -> list:
    # these split is used multiple times, so do it once here
    # note that the useragent string might contain spaces, so we first need to split on quote chars
    split_quote = line.split('"')
    ip = split_quote[0].split()[0]
    date_time = split_quote[0].split('[')[1].split(']')[0]
    method = split_quote[1]
    http1, http2 = split_quote[2].split()
    useragent = split_quote[5]
    bytesize, resp_time1, prom, empty, ip_port, http3, resp_time2, http4, hex_string = split_quote[6].split()
    return [
        ip, date_time, method, http1, http2, useragent, bytesize, resp_time1, prom, empty, ip_port, http3, resp_time2, http4, hex_string
    ]

def print_elements(line):
    split_quote = line.split('"')
    for x, squote in enumerate(split_quote):
        print(f"{x:>2}    {squote}")
        for y, sspace in enumerate(squote.split()):
            print(f"{x:>2} {y:>2} {sspace}")

with open("logfile.log") as infile:
    data = infile.read().splitlines()

print_elements(data[0])

for line in data:
    print(splitline(line))

输出

0    192.168.226.64 - - [26/Apr/2021:21:20:37 +0000] 
 0  0 192.168.226.64
 0  1 -
 0  2 -
 0  3 [26/Apr/2021:21:20:37
 0  4 +0000]
 1    GET /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30 HTTP/2.0
 1  0 GET
 1  1 /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30
 1  2 HTTP/2.0
 2     200 212
 2  0 200
 2  1 212
 3    https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s
 3  0 https://grafana.itoutposts.com/d/xtkCtBkiz/blackbox-exporter-overview?editview=templating&orgId=1&refresh=5s
 4
 5    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36
 5  0 Mozilla/5.0
 5  1 (Macintosh;
 5  2 Intel
 5  3 Mac
 5  4 OS
 5  5 X
 5  6 10_15_7)
 5  7 AppleWebKit/537.36
 5  8 (KHTML,
 5  9 like
 5 10 Gecko)
 5 11 Chrome/90.0.4430.85
 5 12 Safari/537.36
 6     134 0.006 [monitoring-monitoring-prometheus-grafana-80] [] 192.168.226.102:3000 212 0.008 200 6bc328f046dcd1df823aa920397fb346
 6  0 134
 6  1 0.006
 6  2 [monitoring-monitoring-prometheus-grafana-80]
 6  3 []
 6  4 192.168.226.102:3000
 6  5 212
 6  6 0.008
 6  7 200
 6  8 6bc328f046dcd1df823aa920397fb346
['192.168.226.64', '26/Apr/2021:21:20:37 +0000', 'GET /api/datasources/proxy/1/api/v1/query_range?query=probe_ssl_earliest_cert_expiry%7Btarget%3D~%22()%22%7D-time()&start=1619471730&end=1619472030&step=30 HTTP/2.0', '200', '212', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36', '134', '0.006', '[monitoring-monitoring-prometheus-grafana-80]', '[]', '192.168.226.102:3000', '212', '0.008', '200', '6bc328f046dcd1df823aa920397fb346']
['192.168.226.64', '26/Apr/2021:21:20:37 +0000', 'GET /api/datasources/proxy/1/api/v1/query_range?query=probe_success%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0', '200', '201', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36', '116', '0.007', '[monitoring-monitoring-prometheus-grafana-80]', '[]', '192.168.226.102:3000', '201', '0.008', '200', 'c10141117983e888db68f2e1ff223575']
['192.168.226.64', '26/Apr/2021:21:20:37 +0000', 'GET /api/datasources/proxy/1/api/v1/query_range?query=probe_http_ssl%7Btarget%3D~%22()%22%7D&start=1619471730&end=1619472030&step=30 HTTP/2.0', '200', '204', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36', '117', '0.007', '[monitoring-monitoring-prometheus-grafana-80]', '[]', '192.168.226.102:3000', '204', '0.008', '200', '60724ca6531bc640649bac50bbc04a7e']

相关问题