php ANTLR4 lexer如何消耗更多令牌并停止在现有规则上？

bq8i3lrv 于 2023-10-15 发布在 PHP

关注(0)|答案(1)|浏览(119)

ANTLR4 lexer是否可以消耗更多令牌并停止在现有规则上？期望在一个token中消耗更多的字符。
小规则

lexer grammar PhpLexer;

options {
    superClass = PhpLexerBase;
    caseInsensitive = true;
}

T_OPEN_TAG_WITH_ECHO: '<?='  -> pushMode(PHP);
T_OPEN_TAG: PhpOpenTag -> pushMode(PHP);

T_INLINE_HTML: .+?;      // Problem Point

mode PHP;
   T_CLOSE_TAG: '?>';
   T_BAD_CHARACTER: .;

fragment NEWLINE: '\r'? '\n' | '\r';

fragment PhpOpenTag
    : '<?php' ([ \t] | NEWLINE)
    | '<?php' EOF
    ;

输入：

<html><?php echo "Hello, world!"; ?></html>

得到：

T_INLINE_HTML -> "<"
T_INLINE_HTML -> "h"
T_INLINE_HTML -> "t"
T_INLINE_HTML -> "m"
T_INLINE_HTML -> "l"
T_INLINE_HTML -> ">"
T_OPEN_TAG -> "<?php "
……

预期：

T_INLINE_HTML -> "<html>"
T_OPEN_TAG -> "<?php "
……

php

来源：https://stackoverflow.com/questions/77248759/how-antlr4-lexer-consume-more-any-tokens-and-stop-at-existing-rules

1条答案

按热度按时间

gtlvzcf81#

注意，T_INLINE_HTML: .+?;的结果与写入T_INLINE_HTML: .;的结果相同：两者将始终匹配单个字符。
试试这样的方法：

T_INLINE_HTML
 : T_INLINE_HTML_ATOM+
 ;

fragment T_INLINE_HTML_ATOM
 : ~'<'               // match a char other than '<'
 | '<' ~'?'           // match a '<' followed by something other than '?'
 | '<?' ~[p=]         // match '<?' followed by something other than '?' and '='
 | '<?p' ~'h'         // match '<?p' followed by something other than 'h'
 | '<?ph' ~'p'        // match '<?ph' followed by something other than 'p'
 | '<?php' ~[ \t\r\n] // match '<?php' followed by something other than a space char
 ;

赞(0）回复(0）举报 2023-10-15

我来回答

php ANTLR4 lexer如何消耗更多令牌并停止在现有规则上？

1条答案

相关问题

热门标签

最新问答