哪些情况可以从Perl的研究中获益？

uhry853o 于 2022-12-19 发布在 Perl

关注(0)|答案(4)|浏览(124)

我正在研究study，这是一个Perl特性，用于检查字符串，以使后续正则表达式可能更快：

while( <> ) {
    study;
    $count++ if /PATTERN/;
    $count++ if /OTHER/;
    $count++ if /PATTERN2/;
    }

关于哪些情况会从中受益，我们没有太多的论述，您可以从the docs中梳理出以下几点：

具有常量字符串的模式
多种模式
目标字符串越短越好（学习时间越少）

我在寻找一些具体的案例，不仅可以证明它有很大的优势，还可以稍微调整一下，使其失去优势。the docs中的一个警告是，你应该对个别案例进行基准测试。我想找到一些边缘案例，在这些案例中，字符串（或模式）的微小差异会导致性能的巨大差异。
如果你没有使用过study，请不要回答。我宁愿得到格式正确的答案，而不是快速的猜测。这里没有紧急情况，也没有耽误任何工作。
作为奖励，我一直在使用一个基准测试工具来比较NYTProf的两次运行，我宁愿使用它，而不是通常的基准测试工具。如果我想出了一个自动化的方法，我也会分享它。

perl

来源：https://stackoverflow.com/questions/8383527/which-situations-benefit-from-perls-study

4条答案

按热度按时间

txu3uszq1#

谷歌找到了这个lovely test scenario：

#!/usr/bin/perl
# 
#  Exercise 7.8 
# 
# This is a more difficult exercise. The study function in Perl may speed up searches 
# for motifs in DNA or protein. Read the Perl documentation on this function. Its use 
# is simple: given some sequence data in a variable $sequence, type:
# 
# study $sequence;
# 
# before doing the searches. Do you think study will speed up searches in DNA or 
# protein, based on what you've read about it in the documentation?
# 
# For lots of extra credit! Now read the Perl documentation on the standard module 
# Benchmark. (Type perldoc Benchmark, or visit the Perl home page at http://www.
# perl.com.) See if your guess is right by writing a program that benchmarks motif 
# searches of DNA and of protein, with and without study.
#
# Answer to Exercise 7.8

use strict;
use warnings;

use Benchmark;

my $dna = join ('', qw(
agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg
tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct
gggaggcgtgactagaagcggaagtagttgtgggcgcctttgcaaccgcc
tgggacgccgccgagtggtctgtgcaggttcgcgggtcgctggcgggggt
cgtgagggagtgcgccgggagcggagatatggagggagatggttcagacc
cagagcctccagatgccggggaggacagcaagtccgagaatggggagaat
gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgat
cgggtgtgacaactgcaatgagtggttccatggggactgcatccggatca
ctgagaagatggccaaggccatccgggagtggtactgtcgggagtgcaga
gagaaagaccccaagctagagattcgctatcggcacaagaagtcacggga
gcgggatggcaatgagcgggacagcagtgagccccgggatgagggtggag
ggcgcaagaggcctgtccctgatccagacctgcagcgccgggcagggtca
gggacaggggttggggccatgcttgctcggggctctgcttcgccccacaa
atcctctccgcagcccttggtggccacacccagccagcatcaccagcagc
agcagcagcagatcaaacggtcagcccgcatgtgtggtgagtgtgaggca
tgtcggcgcactgaggactgtggtcactgtgatttctgtcgggacatgaa
gaagttcgggggccccaacaagatccggcagaagtgccggctgcgccagt
gccagctgcgggcccgggaatcgtacaagtacttcccttcctcgctctca
ccagtgacgccctcagagtccctgccaaggccccgccggccactgcccac
ccaacagcagccacagccatcacagaagttagggcgcatccgtgaagatg
agggggcagtggcgtcatcaacagtcaaggagcctcctgaggctacagcc
acacctgagccactctcagatgaggaccta
));

my $protein = join('', qw(
MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD
SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ
GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR
));

my $count = 1000;

print "DNA pattern matches without 'study' function:\n";
timethis($count,
    ' for(my $i=1 ; $i < 10000; ++$i) {
        $dna =~ /aggtc/;
        $dna =~ /aatggccgt/;
        $dna =~ /gatcgatcagctagcat/;
        $dna =~ /gtatgaac/;
        $dna =~ /[ac][cg][gt][ta]/;
        $dna =~ /ccccccccc/;
    } '
);

print "\nDNA pattern matches with 'study' function:\n";
timethis($count,
    ' study $dna;
    for(my $i=1 ; $i < 10000; ++$i) {
        $dna =~ /aggtc/;
        $dna =~ /aatggccgt/;
        $dna =~ /gatcgatcagctagcat/;
        $dna =~ /gtatgaac/;
        $dna =~ /[ac][cg][gt][ta]/;
        $dna =~ /ccccccccc/;
    } '
);

print "\nProtein pattern matches without 'study' function:\n";
timethis($count,
    ' for(my $i=1 ; $i < 10000; ++$i) {
        $protein =~ /PH.EI/;
        $protein =~ /KFTEQGESMRLY/;
        $protein =~ /[YAL][NVP][ISV][KQE]/;
        $protein =~ /DKKQIR/;
        $protein =~ /[MD][VT][HQ][ER]/;
        $protein =~ /NVPISVKQEITFTDVSEQL/;
    } '
);

print "\nProtein pattern matches with 'study' function:\n";
timethis($count,
    ' study $protein;
    for(my $i=1 ; $i < 10000; ++$i) {
        $protein =~ /PH.EI/;
        $protein =~ /KFTEQGESMRLY/;
        $protein =~ /[YAL][NVP][ISV][KQE]/;
        $protein =~ /DKKQIR/;
        $protein =~ /[MD][VT][HQ][ER]/;
        $protein =~ /NVPISVKQEITFTDVSEQL/;
    } '
);

请注意，对于利润最高的情况（蛋白质匹配），报告的收益仅约为2%：

#  $ perl exer07.08
# On my computer, this is the output I get: your results probably vary.

#  DNA pattern matches without 'study' function:
#  timethis 1000: 29 wallclock secs (29.25 usr +  0.00 sys = 29.25 CPU) @ 34.19/s (n=1000)
#  
#  DNA pattern matches with 'study' function:
#  timethis 1000: 30 wallclock secs (29.21 usr +  0.15 sys = 29.36 CPU) @ 34.06/s (n=1000)
#  
#  Protein pattern matches without 'study' function:
#  timethis 1000: 32 wallclock secs (29.47 usr +  0.04 sys = 29.51 CPU) @ 33.89/s (n=1000)
#  
#  Protein pattern matches with 'study' function:
#  timethis 1000: 30 wallclock secs (28.97 usr +  0.02 sys = 28.99 CPU) @ 34.49/s (n=1000)
#

赞(0）回复(0）举报 2022-12-19

ecbunoof2#

我会留下笔记作为答案，稍后我会把它发展成一个实际的答案：
在 pp.c 的PP(pp_study)中，它有以下几行奇怪的代码（没有注解）：

if (len == 0 || len > I32_MAX || !SvPOK(sv) || SvUTF8(sv) || SvVALID(sv)) {
RETPUSHNO;
}

看起来带UTF8标志的标量根本没有被研究过。

赞(0）回复(0）举报 2022-12-19

6tr1vspr3#

不完全是。如果你搜索，大多数结果是在Perl测试套件，这意味着没有人使用它。而且，由于bug，你只能notice speed benefits on global variables。它实际上带来了一些速度增强时，处理英语（有时甚至快2倍），但你必须使变量全局。
它有时也会导致infinite loops或false positives（study可能会给你的程序增加bug，即使它只是为了让程序更快），因此在Perl 5.16中它被删除了（或者更确切地说，使之成为空操作）--没有人想维护一个没有人关心的部分。

赞(0）回复(0）举报 2022-12-19

hgqdbh6s4#

无。自2012年以来，study does nothing。
目前代码有

if (len == 0 || len > I32_MAX || !SvPOK(sv) || SvUTF8(sv) || SvVALID(sv)) {
    /* Historically, study was skipped in these cases. */
    SETs(&PL_sv_no);
    return NORMAL;
}

/* Make study a no-op. It's no longer useful and its existence
   complicates matters elsewhere. */
SETs(&PL_sv_yes);
return NORMAL;

这意味着study在它以前已经做了一些事情的情况下返回true，否则返回false--但是它实际上从来没有做过任何事情。

赞(0）回复(0）举报 2022-12-19

我来回答

哪些情况可以从Perl的研究中获益？

4条答案

相关问题

热门标签

最新问答