unix 如果在其他列中有超过2个值等于给定值,则使用awk按uniq删除特定列上所有行

hrirmatl  于 2022-11-04  发布在  Unix
关注(0)|答案(3)|浏览(105)

我有一个6列的ASCII文件,这个文件的行数是24的倍数(第四列是一个日期,%Y%m%d%H%M:24行--〉1天),并表示一个唯一的(这24行中的列1、2、5、6的值相同:是测量站)。
这是一个2x24行的切割示例,即2个不同的工位:

1_200061208 0 0.000000 202202150000 36.680573 15.094369
1_200061208 0 0.000000 202202150100 36.680573 15.094369
1_200061208 0 -99999 202202150200 36.680573 15.094369
1_200061208 0 0.000000 202202150300 36.680573 15.094369
1_200061208 0 0.000000 202202150400 36.680573 15.094369
1_200061208 0 0.000000 202202150500 36.680573 15.094369
1_200061208 0 0.000000 202202150600 36.680573 15.094369
1_200061208 0 0.000000 202202150700 36.680573 15.094369
1_200061208 0 -99999 202202150800 36.680573 15.094369
1_200061208 0 0.000000 202202150900 36.680573 15.094369
1_200061208 0 0.000000 202202151000 36.680573 15.094369
1_200061208 0 0.000000 202202151100 36.680573 15.094369
1_200061208 0 0.000000 202202151200 36.680573 15.094369
1_200061208 0 0.000000 202202151300 36.680573 15.094369
1_200061208 0 0.000000 202202151400 36.680573 15.094369
1_200061208 0 0.000000 202202151500 36.680573 15.094369
1_200061208 0 0.000000 202202151600 36.680573 15.094369
1_200061208 0 0.000000 202202151700 36.680573 15.094369
1_200061208 0 0.000000 202202151800 36.680573 15.094369
1_200061208 0 0.000000 202202151900 36.680573 15.094369
1_200061208 0 0.000000 202202152000 36.680573 15.094369
1_200061208 0 0.000000 202202152100 36.680573 15.094369
1_200061208 0 0.000000 202202152200 36.680573 15.094369
1_200061208 0 0.000000 202202152300 36.680573 15.094369
1_200061190 0 0.000000 202202150000 36.728195 14.993018
1_200061190 0 0.000000 202202150100 36.728195 14.993018
1_200061190 0 0.000000 202202150200 36.728195 14.993018
1_200061190 0 0.000000 202202150300 36.728195 14.993018
1_200061190 0 0.000000 202202150400 36.728195 14.993018
1_200061190 0 0.000000 202202150500 36.728195 14.993018
1_200061190 0 0.000000 202202150600 36.728195 14.993018
1_200061190 0 0.000000 202202150700 36.728195 14.993018
1_200061190 0 0.000000 202202150800 36.728195 14.993018
1_200061190 0 0.000000 202202150900 36.728195 14.993018
1_200061190 0 0.000000 202202151000 36.728195 14.993018
1_200061190 0 0.000000 202202151100 36.728195 14.993018
1_200061190 0 0.000000 202202151200 36.728195 14.993018
1_200061190 0 0.000000 202202151300 36.728195 14.993018
1_200061190 0 0.000000 202202151400 36.728195 14.993018
1_200061190 0 -99999 202202151500 36.728195 14.993018
1_200061190 0 0.000000 202202151600 36.728195 14.993018
1_200061190 0 0.000000 202202151700 36.728195 14.993018
1_200061190 0 0.000000 202202151800 36.728195 14.993018
1_200061190 0 0.000000 202202151900 36.728195 14.993018
1_200061190 0 0.000000 202202152000 36.728195 14.993018
1_200061190 0 0.000000 202202152100 36.728195 14.993018
1_200061190 0 0.000000 202202152200 36.728195 14.993018
1_200061190 0 0.000000 202202152300 36.728195 14.993018

我的目标是检查在第三列中,对于同一个站点(第1、2、5、6列),是否每天(24行)出现超过1次-99999;在这种情况下,我想删除整个24行(换句话说,我想删除此站点的整个测量日)。
预期的输出是相同的文件,但没有满足我的检查的24 xn行。
在本示例中,预期输出为:

1_200061190 0 0.000000 202202150000 36.728195 14.993018
1_200061190 0 0.000000 202202150100 36.728195 14.993018
1_200061190 0 0.000000 202202150200 36.728195 14.993018
1_200061190 0 0.000000 202202150300 36.728195 14.993018
1_200061190 0 0.000000 202202150400 36.728195 14.993018
1_200061190 0 0.000000 202202150500 36.728195 14.993018
1_200061190 0 0.000000 202202150600 36.728195 14.993018
1_200061190 0 0.000000 202202150700 36.728195 14.993018
1_200061190 0 0.000000 202202150800 36.728195 14.993018
1_200061190 0 0.000000 202202150900 36.728195 14.993018
1_200061190 0 0.000000 202202151000 36.728195 14.993018
1_200061190 0 0.000000 202202151100 36.728195 14.993018
1_200061190 0 0.000000 202202151200 36.728195 14.993018
1_200061190 0 0.000000 202202151300 36.728195 14.993018
1_200061190 0 0.000000 202202151400 36.728195 14.993018
1_200061190 0 -99999 202202151500 36.728195 14.993018
1_200061190 0 0.000000 202202151600 36.728195 14.993018
1_200061190 0 0.000000 202202151700 36.728195 14.993018
1_200061190 0 0.000000 202202151800 36.728195 14.993018
1_200061190 0 0.000000 202202151900 36.728195 14.993018
1_200061190 0 0.000000 202202152000 36.728195 14.993018
1_200061190 0 0.000000 202202152100 36.728195 14.993018
1_200061190 0 0.000000 202202152200 36.728195 14.993018
1_200061190 0 0.000000 202202152300 36.728195 14.993018

请给予我密码。

myzjeezk

myzjeezk1#

一个awk想法,使用输入文件的两个通道:

awk '
FNR==NR { if ($3 == "-99999")             # 1st pass: collect count of "-99999" instances
             a[$1 FS $2 FS $5 FS $6]++
          next
        }

 a[$1 FS $2 FS $5 FS $6]+0 <= 1           # 2nd pass: print current line if "-99999" count <= 1; 
                                          # "+0" ==> force non-existent array entry to be processed as a numeric having value of "0"
' filename.txt filename.txt

这将生成:

1_200061190 0 0.000000 202202150000 36.728195 14.993018
1_200061190 0 0.000000 202202150100 36.728195 14.993018
1_200061190 0 0.000000 202202150200 36.728195 14.993018
1_200061190 0 0.000000 202202150300 36.728195 14.993018
1_200061190 0 0.000000 202202150400 36.728195 14.993018
1_200061190 0 0.000000 202202150500 36.728195 14.993018
1_200061190 0 0.000000 202202150600 36.728195 14.993018
1_200061190 0 0.000000 202202150700 36.728195 14.993018
1_200061190 0 0.000000 202202150800 36.728195 14.993018
1_200061190 0 0.000000 202202150900 36.728195 14.993018
1_200061190 0 0.000000 202202151000 36.728195 14.993018
1_200061190 0 0.000000 202202151100 36.728195 14.993018
1_200061190 0 0.000000 202202151200 36.728195 14.993018
1_200061190 0 0.000000 202202151300 36.728195 14.993018
1_200061190 0 0.000000 202202151400 36.728195 14.993018
1_200061190 0 -99999 202202151500 36.728195 14.993018
1_200061190 0 0.000000 202202151600 36.728195 14.993018
1_200061190 0 0.000000 202202151700 36.728195 14.993018
1_200061190 0 0.000000 202202151800 36.728195 14.993018
1_200061190 0 0.000000 202202151900 36.728195 14.993018
1_200061190 0 0.000000 202202152000 36.728195 14.993018
1_200061190 0 0.000000 202202152100 36.728195 14.993018
1_200061190 0 0.000000 202202152200 36.728195 14.993018
1_200061190 0 0.000000 202202152300 36.728195 14.993018
qkf9rpyu

qkf9rpyu2#

awk的另一个想法需要对输入文件进行一次遍历:

awk '

function print_block() {                 # dump lines from array to stdout
    if (count+0 <= 1)                    # if count <= 1 ...
       for (i=1;i<=lineno;i++)           # loop through array ...
           print lines[i]                # printing array entries to stdout
    delete lines                         # delete array entries
    count=lineno=0                       # reset counters
}
    { key=$1 FS $2 FS $5 FS $6

      if (key != prevkey) {              # if looking at a new key then ...
         print_block()                   # dump previous block of lines to stdout
         prevkey=key
      }

      if ($3 == "-99999")                # keep count of times we see "-99999"
         count++

      if (count <= 1)                    # if count <= 1 then ...
         lines[++lineno]=$0              # save current line in array
    }

END { print_block() }                    # flush last block of lines to stdout
' filename.txt

备注:

  • 为给定的键(也称为站)保存行(在数组中),直到我们读取了所有24行(或直到-99999计数大于1),然后..
  • 如果-99999计数〈= 1,则将行(从数组)转储到stdout
  • 但如果-99999计数〉1,则“丢弃”(数组中的)行
  • 存储器的使用被限制为在阵列中容纳最多24行所需的存储器

这将生成:

1_200061190 0 0.000000 202202150000 36.728195 14.993018
1_200061190 0 0.000000 202202150100 36.728195 14.993018
1_200061190 0 0.000000 202202150200 36.728195 14.993018
1_200061190 0 0.000000 202202150300 36.728195 14.993018
1_200061190 0 0.000000 202202150400 36.728195 14.993018
1_200061190 0 0.000000 202202150500 36.728195 14.993018
1_200061190 0 0.000000 202202150600 36.728195 14.993018
1_200061190 0 0.000000 202202150700 36.728195 14.993018
1_200061190 0 0.000000 202202150800 36.728195 14.993018
1_200061190 0 0.000000 202202150900 36.728195 14.993018
1_200061190 0 0.000000 202202151000 36.728195 14.993018
1_200061190 0 0.000000 202202151100 36.728195 14.993018
1_200061190 0 0.000000 202202151200 36.728195 14.993018
1_200061190 0 0.000000 202202151300 36.728195 14.993018
1_200061190 0 0.000000 202202151400 36.728195 14.993018
1_200061190 0 -99999 202202151500 36.728195 14.993018
1_200061190 0 0.000000 202202151600 36.728195 14.993018
1_200061190 0 0.000000 202202151700 36.728195 14.993018
1_200061190 0 0.000000 202202151800 36.728195 14.993018
1_200061190 0 0.000000 202202151900 36.728195 14.993018
1_200061190 0 0.000000 202202152000 36.728195 14.993018
1_200061190 0 0.000000 202202152100 36.728195 14.993018
1_200061190 0 0.000000 202202152200 36.728195 14.993018
1_200061190 0 0.000000 202202152300 36.728195 14.993018
wb1gzix0

wb1gzix03#

$ cat tst.awk
{ key = $1 FS $2 FS $5 FS $6 }
key != prev {
    prt()
    prev = key
}
$3 == -99999 { cnt++ }
{ rec = rec $0 ORS }
END { prt() }

function prt() {
    if ( cnt < 2 ) {
        printf "%s", rec
    }
    rec = cnt = ""
}
$ awk -f tst.awk file
1_200061190 0 0.000000 202202150000 36.728195 14.993018
1_200061190 0 0.000000 202202150100 36.728195 14.993018
1_200061190 0 0.000000 202202150200 36.728195 14.993018
1_200061190 0 0.000000 202202150300 36.728195 14.993018
1_200061190 0 0.000000 202202150400 36.728195 14.993018
1_200061190 0 0.000000 202202150500 36.728195 14.993018
1_200061190 0 0.000000 202202150600 36.728195 14.993018
1_200061190 0 0.000000 202202150700 36.728195 14.993018
1_200061190 0 0.000000 202202150800 36.728195 14.993018
1_200061190 0 0.000000 202202150900 36.728195 14.993018
1_200061190 0 0.000000 202202151000 36.728195 14.993018
1_200061190 0 0.000000 202202151100 36.728195 14.993018
1_200061190 0 0.000000 202202151200 36.728195 14.993018
1_200061190 0 0.000000 202202151300 36.728195 14.993018
1_200061190 0 0.000000 202202151400 36.728195 14.993018
1_200061190 0 -99999 202202151500 36.728195 14.993018
1_200061190 0 0.000000 202202151600 36.728195 14.993018
1_200061190 0 0.000000 202202151700 36.728195 14.993018
1_200061190 0 0.000000 202202151800 36.728195 14.993018
1_200061190 0 0.000000 202202151900 36.728195 14.993018
1_200061190 0 0.000000 202202152000 36.728195 14.993018
1_200061190 0 0.000000 202202152100 36.728195 14.993018
1_200061190 0 0.000000 202202152200 36.728195 14.993018
1_200061190 0 0.000000 202202152300 36.728195 14.993018

相关问题