使用shell拆分数据

pbpqsu0x 于 2021-05-29 发布在 Hadoop

关注(0)|答案(2)|浏览(438)

我是新的shell脚本。我需要使用shell脚本在运行和自动匹配计数之间获取数据。因此，它可以作为半结构化数据进行处理。请给出建议

hadoop shell unix

来源：https://stackoverflow.com/questions/38623006/split-data-using-shell

2条答案

按热度按时间

axr492tv1#

使用 sed -n '/run/,/Automatic/p' filename.txt|sed '1d;$d'|sed '$d;s/ //g' -应清除数据（第一行、最后两行和开头的空格）
shell脚本- split.sh :


# !/bin/bash

sed -n '/run/,/Automatic/p' $1|sed '1d;$d'|sed '$d;s/        //g'

按以下方式运行任何文件，以在控制台和文件中获得输出：

shell> ./split.sh test.txt |tee splitted.dat
United Kingdom:       21/09/2012
Started:      08/02/2013 16:04:44
Finished:     08/02/2013 16:21:23
Time to process:      0 days 0 hours 16 mins 39 secs
Records processed:    37497
Throughput:   135124 records/hour
Time per record:      0.0266 secs

输出将存储在 splitted.dat 文件：

shell> cat splitted.dat 
United Kingdom:       21/09/2012
Started:      08/02/2013 16:04:44
Finished:     08/02/2013 16:21:23
Time to process:      0 days 0 hours 16 mins 39 secs
Records processed:    37497
Throughput:   135124 records/hour
Time per record:      0.0266 secs
shell>

更新：


# !/bin/bash

# p                     - print lines with specified conditions

# !p                    - print lines except specified in conditions (opposite of p)

# |(pipe)               - passes output of first command to the next

# $d                    - delete last line

# 1d                    - delete first line ( nd - delete nth line)

# '/run/,/Automatic/!p' - print lines except lines between 'run' to 'Automatic'

# sed '1d;s/        //g'- use output from first sed command and delete the 1st line and replace spaces with nothing

sed -n '/run/,/Automatic/!p' $1 |sed '1d;s/        //g'

输出：

Verified Correct:     32426 (86.5%)
Good Match:    2102 ( 5.6%)
Good Premise Partial:   862 ( 2.3%)
Tentative Match:       1039 ( 2.8%)
Poor Match:       4 ( 0.0%)
Multiple Matches: 7 ( 0.0%)
Partial Match:  872 ( 2.3%)
Foreign Address:  2 ( 0.0%)
Unmatched:      183 ( 0.5%)

赞(0）回复(0）举报 2021-05-29

t5zmwmid2#

sed -n '/run/,/Automatic/ {//!p }' test.txt

这将打印run和automatic之间的所有行（，）。//！从输出中删除行运行和自动匹配计数。

赞(0）回复(0）举报 2021-05-29

我来回答

使用shell拆分数据

2条答案

相关问题

热门标签

最新问答