如何格式化我们从linux中的uniq -c函数中得到的数据?

6ljaweal  于 2023-08-03  发布在  Linux
关注(0)|答案(3)|浏览(128)

假设我在一个名为temp_data.txt的文本文件中有如下数据:

"John","001-01-0001","engineering"
"Smith","192-11-0292","human resources"
"Brian","192-11-0292","operations"
"Lucius","992-11-0292","human resources"
"Smith","192-11-0292","human resources"

字符串
在此基础上执行uniq -c会产生以下结果:

$ sort temp_data.txt | uniq -c > temp_data_count.txt
$ cat temp_data_count.txt


如何将temp_data_count.txt中的数据转换为如下所示:

“1”,"John","001-01-0001","engineering" 
“2”,"Smith","192-11-0292","human resources" 
“1”,"Brian","192-11-0292","operations" 
“1”,"Lucius","992-11-0292","human resources"


我试过awk -但数据似乎在最后被截断了。

复制自评论:

我试过awk '{print "\x22" $2 "\x22",$1}'-但这只是截断了原始数据,最终的文件看起来很滑稽。另外,是的,我希望在第一列数据中获得引号和逗号,数据的排序在这里并不重要。

46scxncf

46scxncf1#

可以使用sed

sed 's/^\s*//g;s/^[0-9]*/"&",/g;s/, "/,"/g' temp_data_count.txt

字符串
这将给予你的结果:

"1","Brian","192-11-0292","operations"
"1","John","001-01-0001","engineering"
"1","Lucius","992-11-0292","human resources"
"2","Smith","192-11-0292","human resources"


它按顺序使用三个替换:

  1. s/^\s*//g将删除前导空格(从uniq -c输出)
  2. s/^[0-9]*/"&",/g将抓取第一个数字,并将其与引号和逗号放在一起
  3. s/, "/,"/g删除数字后面的空格
64jmpszr

64jmpszr2#

awk溶液:

awk '{ a[$0]++ } END { for (i in a) print "\"" a[i] "\"," i; }' temp_data.txt

字符串
印刷品

"2","Smith","192-11-0292","human resources"
"1","Lucius","992-11-0292","human resources"
"1","Brian","192-11-0292","operations"
"1","John","001-01-0001","engineering"


前提是我修复了从问题复制的数据中的不同尾随空格。(顺序不详。)
此解决方案使用整个输入行作为关联数组的键,并递增该值以计算相同行的数量。最后,数组键(=输入行)和值(=计数器)以指定的格式在循环中被指定。
使用GNU awk,您可以通过设置PROCINFO["sorted_in"]来获得排序输出。
比如说

awk '{ a[$0]++ } END { PROCINFO["sorted_in"]="@ind_str_asc"; for (i in a) print "\"" a[i] "\"," i; }' temp_data.txt


结果是按字符串的值以升序进行排序。

"1","Brian","192-11-0292","operations"
"1","John","001-01-0001","engineering"
"1","Lucius","992-11-0292","human resources"
"2","Smith","192-11-0292","human resources"


参见https://www.gnu.org/software/gawk/manual/html_node/Controlling-Scanning.html
或另一个sed后处理

sort temp_data.txt|uniq -c|sed 's/^\s*\([0-9]*\)\s*/"\1",/'


在行的开头替换任意数量的空格,后跟任意数量的数字(捕获为组1),后跟任意数量的空格,捕获的数字用引号和逗号括起来。

olqngx59

olqngx593#

你们关系很好第一个月
抱歉,完全错了。
@Bodo展示了如何正确地执行此操作,以及无序和排序的输出。为了保持原始顺序,我们需要在第一次看到每条记录时跟踪行号。特别使用GNU awk

gawk '
  !lineno[$0] {lineno[$0] = NR}
  {count[$0]++}
  END {
    PROCINFO["sorted_in"] = "@val_num_asc"
    for (line in lineno)
      printf "\"%d\",%s\n", count[line], line
  }
' temp_data.txt

个字符
对于任何awk,这将产生相同的输出,对文件循环两次

awk '
  NR == FNR {count[$0]++; next}
  !seen[$0]++ {printf "\"%d\",%s\n", count[$0], $0}
' temp_data.txt temp_data.txt

相关问题