如何在Linux中以字段作为唯一键从垂直到水平过滤数据

pinkon5k  于 2023-04-29  发布在  Linux
关注(0)|答案(1)|浏览(87)

我有一些数据块,每个数据块都以<SUBBEGIN开始,以<SUBEND结束。我的内容是这样的:

<SUBBEGIN
SUBSCRIBERIDENTIFIER=803838478;
PAIDTYPE=0;
SUBSCRIPTION=TOOMUCH&73337E0380B4B30F&1&AAA&BBB&CCC&1&1&FFFFFFFFFFFFFFFF&255&1&255&256&FFFFFFFFFFFFFFFF&0&0&128&1&255&255&FFFFFFFFFFFFFF&FFFFFFFFFFFFFF&0&0&0&1&0&0&1;
SUBSCRIPTION=TASKS&E7CC601262AB3535&1&DDD&EEE&FFF&2&1&FFFFFFFFFFFFFFFF&255&0&255&256&FFFFFFFFFFFFFFFF&0&0&128&1&255&255&FFFFFFFFFFFFFF&FFFFFFFFFFFFFF&0&21&0&1&0&0&1;
<SUBEND
<SUBBEGIN
SUBSCRIBERIDENTIFIER=705959905;
PAIDTYPE=254;
SUBSCRIPTION=REALLY&73337E0380B4B30F&1&GGG&HHH&LLL&1&1&FFFFFFFFFFFFFFFF&255&1&255&256&FFFFFFFFFFFFFFFF&0&0&128&1&255&255&FFFFFFFFFFFFFF&FFFFFFFFFFFFFF&0&0&0&1&0&0&1;
SUBSCRIPTION=TIRED&E7CC601262AB3535&1&MMM&NNN&PPP&2&1&FFFFFFFFFFFFFFFF&255&0&255&256&FFFFFFFFFFFFFFFF&0&0&128&1&255&255&FFFFFFFFFFFFFF&FFFFFFFFFFFFFF&0&21&0&1&0&0&1;
<SUBEND

我计划使水平版本和使用一些领域只,标题计划的结果是这样的:

SUBSCRIBERIDENTIFIER,,,PAIDTYPE,,1,255,SERVICENAME,SUBSCRIBEDATETIME,VALIDFROMDATETIME,EXPIREDDATETIME,,,,,

根据这些数据:

SUBSCRIBERIDENTIFIER sample is 803838478 (we can see it in SUBSCRIBERIDENTIFIER)
PAIDTYPE sample is 0 (we can see it in PAIDTYPE)
SERVICENAME sample is TOOMUCH (we can see it in SUBSCRIPTION)
SUBSCRIBEDATETIME sample is AAA (we can see it in SUBSCRIPTION)
VALIDFROMDATETIME sample is BBB (we can see it in SUBSCRIPTION)
EXPIREDDATETIME sample is CCC (we can see it in SUBSCRIPTION)

所以预期的结果是这样的:

803838478,,,0,,1,255,TOOMUCH,AAA,BBB,CCC,,,,,
803838478,,,0,,1,255,TASKS,DDD,EEE,FFF,,,,,
705959905,,,254,,1,255,REALLY,GGG,HHH,LLL,,,,,
705959905,,,254,,1,255,TIRED,MMM,NNN,PPP,,,,,

我试过这个脚本:

awk -F"&" '/^<SUBBEGIN$/{a=1} a && /^[[:blank:]]+(SUBSCRIBERIDENTIFIER|PAIDTYPE|SUBSCRIPTION)/{l=l OFS $1} a && /^<SUBEND$/ {print l; a=l=""}' sample.txt

但结果并不如预期:

SUBSCRIBERIDENTIFIER=803838478;         PAIDTYPE=0;         SUBSCRIPTION=TOOMUCH         SUBSCRIPTION=TASKS
SUBSCRIBERIDENTIFIER=705959905;         PAIDTYPE=254;         SUBSCRIPTION=REALLY         SUBSCRIPTION=TIRED
kse8i1jr

kse8i1jr1#

awk -F'[=;&]' -v OFS=',' '
    /<SUBBEGIN/,/<SUBEND/{
        if($1 == "SUBSCRIPTION"){
            i++
            a["SUBSCRIPTIONS"]=i
            a["SERVICENAME"i]=$2
            a["SUBSCRIBEDATETIME"i]=$5
            a["VALIDFROMDATETIME"i]=$6
            a["EXPIREDDATETIME"i]=$7
        }else{
            a[$1]=$2
        }
    }
    /<SUBEND/{
        for(i=1; i<=a["SUBSCRIPTIONS"]; i++){
            print ( \
                    a["SUBSCRIBERIDENTIFIER"], 
                    "","",
                    a["PAIDTYPE"],
                    "",1,255,
                    a["SERVICENAME"i],
                    a["SUBSCRIBEDATETIME"i],
                    a["VALIDFROMDATETIME"i],
                    a["EXPIREDDATETIME"i],
                    "","","","","" \
                )
        }
        i=0
     }
' file

803838478,,,0,,1,255,TOOMUCH,AAA,BBB,CCC,,,,,
803838478,,,0,,1,255,TASKS,DDD,EEE,FFF,,,,,
705959905,,,254,,1,255,REALLY,GGG,HHH,LLL,,,,,
705959905,,,254,,1,255,TIRED,MMM,NNN,PPP,,,,,

相关问题