在Unix中,有没有一种方法可以从命令行键值对输出中构建CSV文件,并使用不同的分隔符?

ujv3wf0j  于 2022-11-04  发布在  Unix
关注(0)|答案(6)|浏览(240)

我将原始日志片段作为自定义命令的输出转储到控制台上:

bash$ custom-command
current-capacity: 3%, buffer: 1024, not-used/total: 10/10, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 25/25, IsEnabled: 0. Up since Thu Jun 23 11:54:14 2022
current-capacity: 0%, buffer: 1024, not-used/total: 15/15, IsEnabled: 1. Up since Thu Jun 23 11:54:14 2022

我需要有CSV格式,如下面捕获的实时状态的基础上,某些条件,然后我可以重定向输出到CSV文件在定期间隔之前加载到SQL数据库。

current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022

我试过AWK,但仍然面临问题,因为它的大部分是逗号分隔的,除了IsEnabled: 0.结尾,然后是Uptime。有办法吗?我对awk还很陌生。

zzzyeukh

zzzyeukh2#

它只是写一个正则表达式匹配输出并转换它。

sed -E 's/current-capacity: (.*)%, buffer: (.*), not-used/total: (.*), IsEnabled: (.*). Up since (.*)/\1%,\2,\3,\4/'
vsaztqbk

vsaztqbk3#

在每个Unix机器上的任何shell中使用任何awk:

$ cat tst.awk
BEGIN { FS="[:,] "; OFS=", " }
match($0,/\. [^ ]+ [^ ]+/) {
    $0 = substr($0,1,RSTART-1) "," substr($0,RSTART+1,RLENGTH-1) ":" substr($0,RSTART+RLENGTH)
}
NR == 1 {
    for ( i=1; i<NF; i+=2 ) {
        printf "%s%s", $i, (i<(NF-1) ? OFS : ORS)
    }
}
{
    for ( i=2; i<=NF; i+=2 ) {
        printf "%s%s", $i, (i<NF ? OFS : ORS)
    }
}
$ awk -f tst.awk file
current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/25, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 15/15, 1, Thu Jun 23 11:54:14 2022

在上面的第一步中,使用match() { ... }使每个输入行末尾的. Up since Thu字段使用与输入, Up since: Thu的其余部分相同的,:分隔符,这样,代码的其余部分解析现在一致的输入就很容易了。

vfh0ocws

vfh0ocws4#

欢迎使用StackOverflow。感谢您提供示例数据和所需的输出。建议您在此处学习markdown formatting syntax,因为您的代码是以带引号的HTML形式输入的。最好使用代码标记。这将输出固定宽度的文本,并且更易于阅读。
对于您的问题,您可以使用gawk中的match语句来捕获所有使用正则表达式的字段,因为您的输入数据是以相同的方式格式化的。
类似这样的东西会做所需的:

BEGIN{
   # set output separator to comma space
   OFS=", "

   # define the regular expression to capture needed
   # See https://regex101.com/r/K4wYoB/1
   #
   #   ([^,])   captures all until next comma, not including comma
   #   (.)      captures single character
   #   (.*)     at the end, captures remaining
   #
   #   did not use full words, since it was not needed.
   #   
   myregexp="y: ([^,]*).*r: ([^,]*).*al: ([^,]*).*led: (.).*ce (.*)"

   # print header for output
   print "current-capacity, buffer, not-used/total, IsEnabled, Up since"
}

# loop lines. Skipping header line

NR>1{

   # capture data fields
   match($0, myregexp, a)

   # print the line from "a" array
   print a[1], a[2], a[3], a[4], a[5]
}
lfapxunr

lfapxunr5#

输入格式,是Miller输入格式之一
如果你只是跑

mlr --ocsv --ips : clean-whitespace input.txt

您将拥有

current-capacity,buffer,not-used/total,IsEnabled
3%,1024,10/10,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,25/25,0. Up since Thu Jun 23 11:54:14 2022
0%,1024,15/15,1. Up since Thu Jun 23 11:54:14 2022
sshcrbum

sshcrbum6#

{m,g}awk '
NR == (NF=NF)^_ { printf("%s,%s,%s,%s%.00s, Up since\n", 
                              $(__=_^=_<_),  $(_+=++__), 
                         $(_+=__), $(_+__),  FS = FS"|(, )+") 
} {
    for(_^=!__;_<NF;_+=__) {
        $_=___ 
    }   $(+___)=$(_-=_)
} 
sub("^ *(, )*",___,$!(NF = NF))' FS='[.] Up since |[,:][ \t]+' OFS=', '

|

current-capacity, buffer, not-used/total, IsEnabled, Up since
3%, 1024, 10/10, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 25/25, 0, Thu Jun 23 11:54:14 2022
0%, 1024, 15/15, 1, Thu Jun 23 11:54:14 2022

相关问题