已关闭,该问题需要details or clarity,目前不接受回答。
**想要改进此问题?**通过editing this post添加详细信息并澄清问题。
15天前关闭。
Improve this question
我编写了一个Java词法分析器
token.java如下所示
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public enum Token {
TK_MINUS ("-"),
TK_PLUS ("\\+"),
TK_MUL ("\\*"),
TK_DIV ("/"),
TK_NOT ("~"),
TK_AND ("&"),
TK_OR ("\\|"),
TK_LESS ("<"),
TK_LEG ("<="),
TK_GT (">"),
TK_GEQ (">="),
TK_EQ ("=="),
TK_ASSIGN ("="),
TK_OPEN ("\\("),
TK_CLOSE ("\\)"),
TK_SEMI (";"),
TK_COMMA (","),
TK_KEY_DEFINE ("define"),
TK_KEY_AS ("as"),
TK_KEY_IS ("is"),
TK_KEY_IF ("if"),
TK_KEY_THEN ("then"),
TK_KEY_ELSE ("else"),
TK_KEY_ENDIF ("endif"),
OPEN_BRACKET ("\\{"),
CLOSE_BRACKET ("\\}"),
STRING ("\"[^\"]+\""),
TK_FLOAT ("[+-]?([0-9]*[.])?[0-9]+"),
TK_DECIMAL("(?:0|[1-9](?:_*[0-9])*)[lL]?"),
TK_OCTAL("0[0-7](?:_*[0-7])*[lL]?"),
TK_HEXADECIMAL("0x[a-fA-F0-9](?:_*[a-fA-F0-9])*[lL]?"),
TK_BINARY("0[bB][01](?:_*[01])*[lL]?"),
IDENTIFIER ("\\w+");
private final Pattern pattern;
Token(String regex) {
pattern = Pattern.compile("^" + regex);
}
int endOfMatch(String s) {
Matcher m = pattern.matcher(s);
if (m.find()) {
return m.end();
}
return -1;
}
}
Lexer类看起来像这样--〉Lexer.java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.HashSet;
import java.util.Set;
import java.util.stream.Stream;
public class Lexer {
private StringBuilder input = new StringBuilder();
private Token token;
private String lexema;
private boolean exausthed = false;
private String errorMessage = "";
private Set<Character> blankChars = new HashSet<Character>();
public Lexer(String filePath) {
try (Stream<String> st = Files.lines(Paths.get(filePath))) {
st.forEach(input::append);
} catch (IOException ex) {
exausthed = true;
errorMessage = "Could not read file: " + filePath;
return;
}
blankChars.add('\r');
blankChars.add('\n');
blankChars.add((char) 8);
blankChars.add((char) 9);
blankChars.add((char) 11);
blankChars.add((char) 12);
blankChars.add((char) 32);
moveAhead();
}
public void moveAhead() {
if (exausthed) {
return;
}
if (input.length() == 0) {
exausthed = true;
return;
}
ignoreWhiteSpaces();
if (findNextToken()) {
return;
}
exausthed = true;
if (input.length() > 0) {
errorMessage = "Unexpected symbol: '" + input.charAt(0) + "'";
}
}
private void ignoreWhiteSpaces() {
int charsToDelete = 0;
while (blankChars.contains(input.charAt(charsToDelete))) {
charsToDelete++;
}
if (charsToDelete > 0) {
input.delete(0, charsToDelete);
}
}
private boolean findNextToken() {
for (Token t : Token.values()) {
int end = t.endOfMatch(input.toString());
if (end != -1) {
token = t;
lexema = input.substring(0, end);
input.delete(0, end);
return true;
}
}
return false;
}
public Token currentToken() {
return token;
}
public String currentLexema() {
return lexema;
}
public boolean isSuccessful() {
return errorMessage.isEmpty();
}
public String errorMessage() {
return errorMessage;
}
public boolean isExausthed() {
return exausthed;
}
}
我创建了一个类,它可以用来测试这个名为Try.java的词法分析器
package draft;
public class Try {
public static void main(String[] args) {
Lexer lexer = new Lexer("C:/Users/eimom/Documents/Input.txt");
System.out.println("Lexical Analysis");
System.out.println("-----------------");
while (!lexer.isExausthed()) {
System.out.printf("%-18s : %s \n",lexer.currentLexema() , lexer.currentToken());
lexer.moveAhead();
}
if (lexer.isSuccessful()) {
System.out.println("Ok! :D");
} else {
System.out.println(lexer.errorMessage());
}
}
}
因此,假设Input.txt文件包含
>=
0x10
()
11001100
-433
0125
0x3B
那么我期望的输出是
>= TK_GEQ
0x10 TK_HEXADECIMAL
( TK_OPEN ,
) TK_CLOSE
11001100 TK_BINARY
-433 TK_DECIMAL
0125 TK_OCTAL
0x3B TK_BINARY
但我却得到了
Lexical Analysis
------------------
> :TK_GT
= :TK_ASSIGN
0 :TK_FLOAT
x10 :IDENTIFIER
( :TK_OPEN
) :TK_CLOSE
11001100 :TK_FLOAT
- :TK_MINUS
43301250 :TK_FLOAT
x3B :IDENTIFIER
我能做些什么来纠正这些问题呢?看起来代码并没有在一行结束,而是继续使用另一行的下一个字符。
1条答案
按热度按时间woobm2wo1#
这是你自己使用
Files.lines(Path)
做的,Files.lines
的流包含每行的内容,没有行结束符,所以当你把所有的行组合回input
时,你最终得到的文件内容没有换行符。也许你想用
Files.readString(Path)
来代替。我也想知道为什么你不使用Reader
来逐个字符地读取。这通常比试图读取内存中的整个文件更有效(尽管只有当你想分析非常大的文件时才变得重要)。