shell 将文本视为块

niwlg2el  于 2023-06-24  发布在  Shell
关注(0)|答案(3)|浏览(98)

我有以下文本块:

<item>
    <link>test1</link>
    <title>ABC1</title>
    <Date>Tue, 07 Jun 2023</Date>
</item>
<item>
    <link>test3</link>
    <title>ABC3</title>
    <Date>Fri, 27 Jun 2022</Date>
</item>
<item>
    <link>test2</link>
    <title>ABC2</title>
    <Date>Mon, 05 Jun 2021</Date>
</item>

我正在尝试获得以下格式:

Tue, 07 Jun 2023 <item> 
Tue, 07 Jun 2023     <link>test1</link>
Tue, 07 Jun 2023     <title>ABC1</title>
Tue, 07 Jun 2023     <Date>Tue, 07 Jun 2023</Date>
Tue, 07 Jun 2023 </item>
Fri, 27 Jun 2022 <item>
Fri, 27 Jun 2022     <link>test3</link>
Fri, 27 Jun 2022     <title>ABC3</title>
Fri, 27 Jun 2022     <Date>Fri, 27 Jun 2022</Date>
Fri, 27 Jun 2022 </item>
Mon, 05 Jun 2021 <item>
Mon, 05 Jun 2021     <link>test2</link>
Mon, 05 Jun 2021     <title>ABC2</title>
Mon, 05 Jun 2021     <Date>Mon, 05 Jun 2021</Date>
Mon, 05 Jun 2021 </item>

我知道如何从文本中选择所需的日期,这可以通过以下方式完成:

grep -oE "[A-Z][a-z][a-z],.*2[0-9][0-9][0-9]" ./file

但是不知道如何将提取的日期插入到文本块中(从<item></item>开始)。
看起来我需要将文本视为块。有谁能给予我点提示吗?

pnwntuvh

pnwntuvh1#

这感觉很“脆弱”,你绝对应该考虑使用脚本语言/解析器来处理这种类型的数据,但是你可以使用awk,例如。

awk 'BEGIN{FS="[><]"} {a[++b] = $0} /Date/ {c = $3} /\/item/ {for (i=1; i<=length(a); i++) {print c, a[i]}; delete a; b = c = 0}' test.txt
Tue, 07 Jun 2023 <item>
Tue, 07 Jun 2023     <link>test1</link>
Tue, 07 Jun 2023     <title>ABC1</title>
Tue, 07 Jun 2023     <Date>Tue, 07 Jun 2023</Date>
Tue, 07 Jun 2023 </item>
Fri, 27 Jun 2022 <item>
Fri, 27 Jun 2022     <link>test3</link>
Fri, 27 Jun 2022     <title>ABC3</title>
Fri, 27 Jun 2022     <Date>Fri, 27 Jun 2022</Date>
Fri, 27 Jun 2022 </item>
Mon, 05 Jun 2021 <item>
Mon, 05 Jun 2021     <link>test2</link>
Mon, 05 Jun 2021     <title>ABC2</title>
Mon, 05 Jun 2021     <Date>Mon, 05 Jun 2021</Date>
Mon, 05 Jun 2021 </item>

更好的格式和一些评论:

awk 'BEGIN {
    FS = "[><]"    # set the field separator to either < or >
}

{
    a[++b] = $0    # load every line from the block into a numbered array
}

/Date/ {           # if the line contains Date, grab column 3 (the date)
    c = $3
}

/\/item/ {         # for /item line, print Date then all of the array
    for (i = 1; i <= length(a); i++) {
        print c, a[i]
    }
    delete a       # delete the array for the next <item> ... <\item> block
    b = c = 0      # reset variables for the next <item> ... <\item> block
}' test.txt
Tue, 07 Jun 2023 <item>
Tue, 07 Jun 2023     <link>test1</link>
Tue, 07 Jun 2023     <title>ABC1</title>
Tue, 07 Jun 2023     <Date>Tue, 07 Jun 2023</Date>
Tue, 07 Jun 2023 </item>
Fri, 27 Jun 2022 <item>
Fri, 27 Jun 2022     <link>test3</link>
Fri, 27 Jun 2022     <title>ABC3</title>
Fri, 27 Jun 2022     <Date>Fri, 27 Jun 2022</Date>
Fri, 27 Jun 2022 </item>
Mon, 05 Jun 2021 <item>
Mon, 05 Jun 2021     <link>test2</link>
Mon, 05 Jun 2021     <title>ABC2</title>
Mon, 05 Jun 2021     <Date>Mon, 05 Jun 2021</Date>
Mon, 05 Jun 2021 </item>
4nkexdtk

4nkexdtk2#

Perl的一个小技巧:

perl -0777 -pe '
    s{(<item>.+?<Date>(.+?)</Date>.+?</item>)}   # capture a block and its date
     {
        $date = $2;                              # cache the date
        ($block = $1) =~ s/^/$date /mg;          # add the date at start of each line
        $block                                   # return the modified block
     }sge'

perl选项-0777意味着整个输入被slurped到默认变量中。
s///标志

  • g表示全局搜索和替换
  • m表示^匹配字符串的开头或换行符之后
  • s表示.可以匹配换行符
  • e表示替换字符串被评估为代码。
t1rydlwq

t1rydlwq3#

TXR溶液。首先是一个与数据相当精确匹配的简单方法:

@(repeat)
<item>
@line1
@line2
    <Date>@date</Date>
</item>
@  (output)
@date <item>
@date @line1
@date @line2
@date     <Date>@date</Date>
@date </item>
@  (end)
@(end)
$ txr blocks.txr blocks.xml
Tue, 07 Jun 2023 <item>
Tue, 07 Jun 2023     <link>test1</link>
Tue, 07 Jun 2023     <title>ABC1</title>
Tue, 07 Jun 2023     <Date>Tue, 07 Jun 2023</Date>
Tue, 07 Jun 2023 </item>
Fri, 27 Jun 2022 <item>
Fri, 27 Jun 2022     <link>test3</link>
Fri, 27 Jun 2022     <title>ABC3</title>
Fri, 27 Jun 2022     <Date>Fri, 27 Jun 2022</Date>
Fri, 27 Jun 2022 </item>
Mon, 05 Jun 2021 <item>
Mon, 05 Jun 2021     <link>test2</link>
Mon, 05 Jun 2021     <title>ABC2</title>
Mon, 05 Jun 2021     <Date>Mon, 05 Jun 2021</Date>
Mon, 05 Jun 2021 </item>

更一般的:这些块可以是任何数量的行,使用任何标签(不仅是item):

@(repeat)
<@tag>
@  (collect)
@line
@  (last)
    <Date>@date</Date>
</@tag>
@  (end)
@  (output)
@date <@tag>
@  (repeat)
@date @line
@  (end)
@date     <Date>@date</Date>
@date </@tag>
@  (end)
@(end)

在Vim中使用正确的语法高亮显示,您可以轻松地将文字文本与指令和变量区分开来:

相关问题