applescript中是否有类似regEx的东西,如果没有,替代方案是什么?

im9ewurl  于 2023-05-08  发布在  其他
关注(0)|答案(8)|浏览(91)

我需要解析文件名的前10个字符,看看它们是否都是数字。最明显的方法是fileName =~ m/^\d{10}/,但是我在applescript引用中没有看到任何regExy,所以,我很好奇还有什么其他选项可以进行此验证。

vptzau2j

vptzau2j1#

不要绝望,因为OSX你也可以通过“do shell script”访问sed和grep。所以:

set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
set sedResult to do shell script thecommandstring
set isgood to sedResult starts with "*good*"

我的sed技能不是很容易崩溃,所以可能有一种更优雅的方法,而不是将 good 附加到任何匹配[0-9]{10}的名称,然后在结果的开头查找 good。但基本上,如果文件名是“1234567890dfoo.mov”,这将运行命令:

echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"

注意applescript中的转义引号\”和转义反斜杠\。如果你要在壳里逃脱,你必须逃脱逃脱。所以要运行一个包含反斜杠的shell脚本,你必须像\那样对shell进行转义,然后像\\那样对applescript中的每个反斜杠进行转义。这可能很难读懂。
所以你可以在命令行上做的任何事情都可以通过从applescript调用它来完成(woohoo!)。stdout上的任何结果都会作为结果返回给脚本。

tcomlyy6

tcomlyy62#

有一种更简单的方法可以使用shell(适用于bash 3.2+)进行正则表达式匹配:

set isMatch to "0" = (do shell script ¬
  "[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")

注意事项:

  • 使用现代bash测试表达式[[ ... ]]和regex匹配运算符=~;not 在bash 3.2+上必须使用右操作数(或者至少是特殊的正则字符),除非你在前面加上shopt -s compat31;
  • do shell script语句执行测试,并通过一个附加命令(谢谢,@LauriRanta)返回其退出命令; "0"表示成功。
  • 请注意,=~运算符不支持快捷字符类,如\d和Assert,如\b(从OS X 10.9.4开始为真-这不太可能很快改变)。
  • 对于 * 不区分大小写 * 匹配,在命令字符串前面加上shopt -s nocasematch;
  • 对于 locale-awareness,在命令字符串前面加上export LANG='" & user locale of (system info) & ".UTF-8';
  • 如果正则表达式包含 capture groups,则可以通过内置的${BASH_REMATCH[@]}数组变量访问捕获的字符串。
  • 在接受的答案中,您必须使用\-转义双引号和反斜杠。

以下是使用egrep的替代方案:

set isMatch to "0" = (do shell script ¬
  "egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")

虽然这可能表现得更差,但它有两个优点:

  • 可以使用快捷字符类(如\d)和Assert(如\b
  • 您可以通过使用-i调用egrep来更轻松地使匹配不区分大小写:
  • 然而,你不能通过捕获组获得子匹配;如果需要,请使用[[ ... =~ ... ]]方法。

最后,这里是实用函数,它将两种方法都打包了(语法突出显示是关闭的,但它们确实工作):

# SYNOPIS
#   doesMatch(text, regexString) -> Boolean
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language *including* 
#   support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
#   there is a match or not.
#    - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
#      a 'considering case' block.
#    - The current user's locale is respected.
# EXAMPLE
#    my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
on doesMatch(s, regex)
    local ignoreCase, extraGrepOption
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraGrepOption to "i"
    else
        set extraGrepOption to ""
    end if
    # Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
    #       Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
    tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
end doesMatch

# SYNOPSIS
#   getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language and
#   *returns the matching string and substrings matching capture groups, if any.*
#   
#   - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
#     a 'considering case' block.
#   - The current user's locale is respected.
#   
#   IMPORTANT: 
#   
#   Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
#   Instead, use one of the following POSIX classes (see `man re_format`):
#       [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
#       [[:alnum:]] [[:digit:]] [[:xdigit:]]
#       [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]] 
#       [[:graph:]]  [[:print:]] 
#   
#   Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
#   
#   Always returns a *list*:
#    - an empty list, if no match is found
#    - otherwise, the first list element contains the matching string
#       - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
#  EXAMPLE
#       my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
on getMatch(s, regex)
    local ignoreCase, extraCommand
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraCommand to "shopt -s nocasematch; "
    else
        set extraCommand to ""
    end if
    # Note: 
    #  So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
    #  Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
    #  Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
    tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
    return paragraphs of result
end getMatch
klh5stk1

klh5stk13#

我最近在一个脚本中需要正则表达式,并希望找到一个脚本添加来处理它,这样就更容易阅读发生了什么。我找到了Satimage.osax,它允许你使用如下语法:

find text "n(.*)" in "to be or not to be" with regexp

唯一的缺点是(截至2010年11月8日)它是一个32位加法,因此当从64位进程调用它时会抛出错误。这让我陷入了雪豹的邮件规则,因为我必须在32位模式下运行邮件。从一个独立的脚本调用,虽然,我没有保留-它真的很棒,让你选择任何你想要的正则表达式语法,并使用反向引用。

更新2011年5月28日

感谢Mitchell Model在下面的评论指出,他们已经将其更新为64位,所以没有更多的保留-它做了我需要的一切。

6l7fqoea

6l7fqoea4#

我确信有一个ApplescriptAddition或shell脚本可以调用来将regex引入到文件夹中,但我避免了对简单内容的依赖。我一直用这种风格模式。。

set filename to "1234567890abcdefghijkl"

return isPrefixGood(filename)

on isPrefixGood(filename) --returns boolean
    set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
    
    set thePrefix to (characters 1 thru 10) of filename as text
    
    set badPrefix to false
    
    repeat with thisChr from 1 to (get count of characters in thePrefix)
        set theChr to character thisChr of thePrefix
        if theChr is not in legalCharacters then
            set badPrefix to true
        end if
    end repeat
    
    if badPrefix is true then
        return "bad prefix"
    end if
    
    return "good prefix"
end isPrefixGood
2nbm6dog

2nbm6dog5#

这里有另一种方法来检查任何字符串的前十个字符是否是数字。

on checkFilename(thisName)
        set {n, isOk} to {length of fileName, true}
        try
            repeat with i from 1 to 10
                set isOk to (isOk and ((character i of thisName) is in "0123456789"))
            end repeat
            return isOk
        on error
            return false
        end try
    end checkFilename
gmxoilav

gmxoilav6#

我能够直接从AppleScript(在High Sierra上)调用JavaScript,如下所示。

# Returns a list of strings from _subject that match _regex
# _regex in the format of /<value>/<flags>
on match(_subject, _regex)
    set _js to "(new String(`" & _subject & "`)).match(" & _regex & ")"
    set _result to run script _js in "JavaScript"
    if _result is null or _result is missing value then
        return {}
    end if
    return _result
end match

match("file-name.applescript", "/^\\d+/g") #=> {}
match("1234_file.js", "/^\\d+/g") #=> {"1234"}
match("5-for-fighting.mp4", "/^\\d+/g") #=> {"5"}

看起来大部分JavaScript String methods都能按预期工作。我还没有找到一个关于哪个版本的ECMAScript与JavaScript for macOS Automation兼容的参考,所以在使用前请进行测试。

dy2hfwbg

dy2hfwbg7#

我有一个替代方案,直到我实现了汤普森NFA算法的字符类,我才在AppleScript中完成了工作的基本内容。如果有人有兴趣用Applescript解析非常基本的正则表达式,那么代码发布在MacScripters的CodeExchange中,请看一看!
下面是判断文本/字符串的前十个字符是否为:

set mstr to "1234567889Abcdefg"
set isnum to prefixIsOnlyDigits for mstr
to prefixIsOnlyDigits for aText
    set aProbe to text 1 thru 10 of aText
    set isnum to false
    if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
        try
            set aNumber to aProbe as number
            set isnum to true
        end try
    end if
    return isnum
end prefixIsOnlyDigits
b91juud3

b91juud38#

正如在其他答案中提到的,Applescript中没有对正则表达式的语言级支持,但从约塞米蒂开始,您可以切换到Javascript for Applications (JXA)引擎(也可以参见Apple's docs),它确实包含正则表达式引擎。
使用JXA根据正则表达式验证URL的示例:

var app = Application.currentApplication();
app.includeStandardAdditions = true;

var text = "https://www.example.com";
var patt = /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/;

app.displayAlert(text.search(patt));

注意️:为了从Script Editor.app运行脚本,请确保在运行时下拉列表中选择Javascript,如下所示:

要使用osascript从终端的shell脚本运行纯文本格式的JXA,请使用用途:

osascript -l JavaScript myJxaScript.js

或者,如果您通过Script Editor.app以二进制格式保存脚本,并使用.scpt扩展名,那么它可以在没有引擎说明符的情况下运行:

osascript myJxaScript.scpt

另一种选择是添加#!/usr/bin/osascript -l JavaScript的shebang,执行chmod +x myJxaScript.js并将js脚本作为可执行文件运行:

./myJxaScript.js
  • ️ 如果你正在寻找将现有的applescript转换为JXA JS的方法,那么一些通用的AI聊天机器人可能能够以不同的成功率完成它。

相关问题