linux 如何检测2个文件之间的差异,并在发现特殊字符时重复比较?

mklgxw1f  于 2023-06-21  发布在  Linux
关注(0)|答案(1)|浏览(125)

我需要检测2个文件之间的所有差异,并重复比较时,发现特殊字符,并打印在第三个文件。
如果file1为:

a
b
c
d

而file2是:

1:
b
d
--
2:
a
--
3:
c
a

则期望输出为:

1:
a
c
--
2:
b
c
d
--
3:
b
d

有什么建议吗?我尝试的一切都键入了1个文件的差异,而不是两者。
我的代码:

#!/bin/bash

file1=file1
file2=file2
output_file=Filee
#!/bin/bash

# Compare the files and store the differences in a temporary file
diff_file=$(mktemp)
diff --changed-group-format='%<' --unchanged-group-format='' "$file1" "$file2" > "$diff_file"

# Process the differences and write them to the output file
group_number=1
current_group=""
while IFS= read -r line; do
    if [[ $line == -- ]]; then
        if [[ -n $current_group ]]; then
            echo "--" >> "$output_file"
            ((group_number++))
        fi
    else
        if [[ -z $current_group ]]; then
            echo "$group_number:" >> "$output_file"
        fi
        echo "$line" >> "$output_file"
        current_group=$group_number
    fi
done < "$diff_file"

# Remove the temporary file
rm "$diff_file"

echo "Comparison completed. Results saved to $output_file"
oewdyzsn

oewdyzsn1#

我期待着你的解决方案。我想知道你是如何解决它的:-)
我已经做了一些使用正则表达式的东西,也适用于字符串(没有空格)。它使用正则表达式来搜索模式,并跳过file 2中的命中。脚本有注解来解释它。

#! /bin/bash

# Disable history, aka `!`
set +H

readonly ID=(1 2 3)
readonly FILE_IN=file1
readonly FILE_TST=file2
readonly FILE_OUT=file3

get() {
        local id=${1?No ID given to ${FUNCNAME[0]}}
        local d=${2?No delimiter given to ${FUNCNAME[0]}}
        local f=${3?No file given to ${FUNCNAME[0]}}
        # ^                     - start of line (file due to -0 option)
        # (.|\n)*$id:\n         - Skip everything until $id: is found
        # (?:(?!$d)(.|\n))*     - Match everything until delimiter
        # ($d|$)                - Delimiter or end of line (file in this case)
        # (.|\n)*               - The rest (if any)
        perl -0pe "s/^(.|\n)*$id:\n((?:(?!$d)(.|\n))*)($d|$)(.|\n)*/\2\n/" $f
}

# Truncate file
echo -ne "" > ${FILE_OUT}
for i in ${ID[@]}; do

        # Grep data from "get"
        data=($(get ${i} '--' "${FILE_TST}" ))

        # Build regex to delete entry per entry in file instead
        # of one long regexp.
        data="$(echo ${data[@]} | tr ' ' '\n'; echo)"
        regxp="$(sed -r 's/^/\//; s/$/\/d;/' <<< "${data}")"
        
        # Populate file
        cat >> ${FILE_OUT} <<-EOF
${i}:
$(sed -r "${regxp}" ${FILE_IN})
EOF
done

# Show output
echo "File ${FILE_OUT} finished"

exit 0

file1

$ cat file1 
ab
bc
cd
de

file2

1:
bc
de
--
2:
ab
--
3:
cd
ab

文件3

1:
ab
cd
2:
bc
cd
de
3:
bc
de

PS:我更希望有一个投票。银的狂欢徽章就要来了\0/

相关问题