perl从文件读取和过滤输入

np8igboo  于 2023-06-06  发布在  Perl
关注(0)|答案(3)|浏览(455)

我有数据输入文件格式如下例,

<name> <attr1> <attr2> <attr3> <working_area> <date>
alan x x x /path/to/alan_work/a Wed_May_17_04:17:40_2023
alan x x x /path/to/alan_work/b Sun_May_28_21:22:52_2023
alan x a x /path/to/alan_work/c Sun_May_28_22:25:47_2023
ben x x x /path/to/ben_work/a Wed_May_17_04:18:44_2023
ben a b x /path/to/ben_work/b Wed_May_17_08:19:47_2023
charles a a a /path/to/charles_work/a Wed_May_17_04:17:40_2023
charles a a a /path/to/charles_work/b Thurs_May_18_04:17:40_2023
ben x x x /path/to/ben_work/c Fri_May_19_04:18:44_2023

我写的Perl脚本,并希望达到以下标准:
1.对于同一个用户,如果在两个或两个以上不同的工作区中,属性1、2和3都相同,则获取具有最新日期属性的工作区路径
预期输出:

/path/to/alan_work/b
/path/to/alan_work/c
/path/to/ben_work/c
/path/to/ben_work/b
/path/to/charles_work/b

简短的片段(我不知道如何继续)

open(FF, '<', $temp_file) or die "cannot open $temp_file";
    while (my $line = <FF>) {
      chomp $line;
      my @split_type = split(' ', $line);
    #no idea here
    }
klr1opcd

klr1opcd1#

将数据存储在以名称、属性和区域为键的散列中,使用日期作为值。按值对区域进行排序(您需要为您的格式实现日期比较,或者在填充散列时解析它并使用可比值填充散列)并返回最后一个。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

# This needs to properly parse the dates, but for the example it
# works, as the dates to compare are always in the same month and never
# on the same day.
sub by_date {
    my ($dates_by_area, $A, $B) = @_;
    $dates_by_area->{$A} =~ /May_?([0-9]+)/;
    my $day_a = $1;
    $dates_by_area->{$B} =~ /May_?([0-9]+)/;
    my $day_b = $1;
    $day_a <=> $day_b
}

my $temp_file = shift;

open my $in, '<', $temp_file or die "cannot open $temp_file";
my %dates;
while (my $line = <$in>) {
    next if $line =~ /^</;

    my ($name, $attr1, $attr2, $attr3, $area, $date) = split ' ', $line;
    $dates{$name}{$attr1}{$attr2}{$attr3}{$area} = $date;
}

for my $name (keys %dates) {
    for my $attr1 (keys %{ $dates{$name} }) {
        for my $attr2 (keys %{ $dates{$name}{$attr1} }) {
            for my $attr3 (keys %{ $dates{$name}{$attr1}{$attr2} }) {
                my %dates_by_area = %{ $dates{$name}{$attr1}{$attr2}{$attr3} };
                my @sorted = sort { by_date(\%dates_by_area, $a, $b) }
                             keys %dates_by_area;
                say $sorted[-1];
            }
        }
    }
}

%data中收集的结构可以使用

use Data::Dumper;
warn Dumper \%data;

开关为示例提供以下输出:

$VAR1 = {
          'alan' => {
                      'x' => {
                               'x' => {
                                        'x' => {
                                                 '/path/to/alan_work/a' => 'Wed_May17_04:17:40_2023',
                                                 '/path/to/alan_work/b' => 'Sun_May_28_21:22:52_2023'
                                               }
                                      },
                               'a' => {
                                        'x' => {
                                                 '/path/to/alan_work/c' => 'Sun_May_28_22:25:47_2023'
                                               }
                                      }
                             }
                    },
          'ben' => {
                     'x' => {
                              'x' => {
                                       'x' => {
                                                '/path/to/ben_work/a' => 'Wed_May17_04:18:44_2023',
                                                '/path/to/ben_work/c' => 'Fri_May19_04:18:44_2023'
                                              }
                                     }
                            },
                     'a' => {
                              'b' => {
                                       'x' => {
                                                '/path/to/ben_work/b' => 'Wed_May17_08:19:47_2023'
                                              }
                                     }
                            }
                   },
          'charles' => {
                         'a' => {
                                  'a' => {
                                           'a' => {
                                                    '/path/to/charles_work/a' => 'Wed_May17_04:17:40_2023',
                                                    '/path/to/charles_work/b' => 'Thurs_May18_04:17:40_2023'
                                                  }
                                         }
                                }
                       }
        };

您没有说明如果相同的名称、属性和区域有两个不同的日期会发生什么情况。当前实现仅使用输入中的最后一个对应行。
另外,您可以注意到我切换到词法文件句柄以避免单词文件句柄带来的问题。当使用split ' '时,您不需要chomp,因为这种特殊形式的分割会删除包括换行符在内的尾部空白。

u1ehiz5o

u1ehiz5o2#

perl -MTime::Piece -nE '
    # extract fields
    ($u,$a1,$a2,$a3,$p,$_) = split;
    $id = "$u $a1 $a2 $a3";

    # massage date format into standard form
    y/[A-Za-z0-9]//cd;
    s/.*([A-Z][a-z]{2})[^\d]*/$1/;
    eval {
        $t = Time::Piece->strptime($_,"%b%d%H%M%S%Y")->datetime;
    } or do {
        # add error handling
        # (this also catches any header)
        next;
    };

    # save path if "better"
    if ($t ge $ts{$id}) {
        $ts{$id} = $t;
        $ps{$id} = $p;
    }

    # print results
    END { say for sort values %ps }
' datafile
o2g1uqev

o2g1uqev3#

由于值用空格分隔,日期组件用下划线分隔,因此处理这一点相当简单。
我们将使用用户名和属性作为散列的关键字,并将散列的值替换为最高日期值的工作路径。
要做到这一点,我们必须将日期转换为可以比较的标准形式:

use strict;
use warnings;
use v5.10;

my $file = 'input.txt';
open my $fh, '<', $file or die "Could not open $file: $!\n";

my %paths;
while(<$fh>){
    /^</ and next;     # skip the header
    my ($name, $attr1, $attr2, $attr3, $workpath, $date) = split;
    my $key = "$name|$attr1$attr2$attr3";
    $date = transformDate($date);

    $paths{$key} = [$date, $workpath]
        if !defined $paths{$key} || $date gt $paths{$key}[0];
}

say $paths{$_}[1] for sort keys %paths;

# change date from: Wed_May_17_04:17:40_2023
#          to this: 2023051704:17:40
sub transformDate {
    my $date = shift;
    state $monthindex = {
        Jan => 1,  Feb => 2,  Mar => 3,
        Apr => 4,  May => 5,  Jun => 6,
        Jul => 7,  Aug => 8,  Sep => 9,
        Oct => 10, Nov => 11, Dec => 12,
    };
    my (undef, $month, $day, $time, $year) = split/_/, $date;
    sprintf('%d%02d%02d%s', $year, $monthindex->{$month}, $day, $time);
}

***编辑:*删除了替代日期解析,因为在明确了日期格式后就不需要了。

相关问题