haskell 如何使用Parsec从String中提取注解?

t98cgbkg  于 2023-06-23  发布在  其他
关注(0)|答案(1)|浏览(142)

我试着只解析String中的注解,我已经接近了,但还没有完全做到。

import Text.ParserCombinators.Parsec

parseSingleLineComment :: Parser String
parseSingleLineComment = do 
    string "//" 
    x <- manyTill anyChar newline
    spaces 
    return x
parseMultilineComment :: Parser String
parseMultilineComment = do
    string "/*" 
    x <- manyTill anyChar (string "*/")
    spaces
    return x
parseEndOfFile :: Parser String
parseEndOfFile = do 
  x <- eof
  return ""

parseComment :: Parser String
parseComment = try parseSingleLineComment <|> try parseMultilineComment
    
parseNotComment :: Parser String
parseNotComment = manyTill anyChar (lookAhead (try parseComment <|> parseEndOfFile))

extractComments :: Parser [String]
extractComments = do
  manyTill anyChar (lookAhead (parseComment <|> parseEndOfFile))
  xs <- try $ sepEndBy1 parseComment parseNotComment
  eof
  return $ xs

printHelperF :: String -> IO ()
printHelperF s = do
  print s
  print $ parse extractComments "Test Parser" s
  print "-------------------"

-- main
main :: IO ()
main = do 
  let sample0 = "No comments here"
  let sample1 = "//Hello there!\n//General Kenobi"
  let sample2 = "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/"
  let sample3 = " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/\nlet c = 500;"
  let sample4 = "//First\n//Second//NotThird\n//Third"
  let samples = [sample0, sample1, sample2, sample3, sample4]
  mapM_ printHelperF samples

-- > runhaskell test.hs
-- "No comments here"
-- Left "Test Parser" (line 1, column 17):
-- unexpected end of input
-- expecting "//" or "/*" <---------- fails because no comment in string
-- "-------------------"
-- "//Hello there!\n//General Kenobi"
-- Right ["Hello there!"] <---------- fails to extract the last comment
-- "-------------------"
-- "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/"
-- Right [" What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!"] <- correct
-- "-------------------"
-- " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/\nlet c = 500;"
-- Right ["Global Variable","TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n"] <- correct
-- "-------------------"
-- "//First\n//Second//NotThird\n//Third"
-- Right ["First","Second//NotThird"] <- again fails to extract the last comment
-- "-------------------"
wz8daaqr

wz8daaqr1#

如果您将sepEndBy1替换为sepEndBy,则应该可以解决“无注解”情况失败的问题。
要处理最后的单行注解没有终止换行符的情况,请尝试使用:

parseSingleLineComment :: Parser String
parseSingleLineComment = do
    string "//"
    noneOf "\n"

在进行这些更改之后,您应该考虑其他几个测试用例。多行注解中的星号会导致注解被忽略。

λ> printHelperF "x = 3*4 /* not 3*5 */"
"x = 3*4 /* not 3*5 */"
Right []
"-------------------"

要解决此问题,您需要以下内容:

parseMultilineComment :: Parser String
parseMultilineComment = do
    string "/*"
    manyTill anyChar (try (string "*/"))

此外,未终止的多行注解也被视为代码:

> printHelperF "/* unterminated comment"
"/* unterminated comment"
Right []
"-------------------"

这可能是一个解析错误。解决这个问题需要移动一些try逻辑。从parseComment中取出try调用:

parseComment :: Parser String
parseComment = parseSingleLineComment <|> parseMultilineComment

并将它们移动到子功能中:

parseSingleLineComment :: Parser String
parseSingleLineComment = do
    try (string "//")
    many (noneOf "\n")

parseMultilineComment :: Parser String
parseMultilineComment = do
    try (string "/*")
    manyTill anyChar (try (string "*/"))

这个版本的parseMultilineComment的工作方式是,一个单独的/字符将导致第一个解析器失败,但try将确保没有输入被消耗(即没有找到注解)。另一方面,如果string "/*"成功,则manyTill将搜索终止的string "*/"。如果没有找到,解析器将失败,但在消费输入(即string "/*")之后。这反而会导致分析错误。
为了使其正确工作,我们需要删除parseNotComment中的try

parseNotComment :: Parser String
parseNotComment = manyTill anyChar (lookAhead (parseComment <|> parseEndOfFile))

我们还可以简化extractComments,因为它的第一行现在与parseNotComment相同,而另一行try是冗余的:

extractComments :: Parser [String]
extractComments = do
  parseNotComment
  xs <- sepEndBy parseComment parseNotComment
  eof
  return $ xs

最后的结果应该通过你的测试,再加上一些:

module Comments where

import Text.ParserCombinators.Parsec

parseSingleLineComment :: Parser String
parseSingleLineComment = do
    try (string "//")
    many (noneOf "\n")

parseMultilineComment :: Parser String
parseMultilineComment = do
    try (string "/*")
    manyTill anyChar (try (string "*/"))

parseEndOfFile :: Parser String
parseEndOfFile = do
    x <- eof
    return ""

parseComment :: Parser String
parseComment = parseSingleLineComment <|> parseMultilineComment

parseNotComment :: Parser String
parseNotComment = manyTill anyChar (lookAhead (parseComment <|> parseEndOfFile))

extractComments :: Parser [String]
extractComments = do
  parseNotComment
  xs <- sepEndBy parseComment parseNotComment
  eof
  return $ xs

printHelperF :: String -> IO ()
printHelperF s = do
  print s
  print $ parse extractComments "Test Parser" s
  print "-------------------"

-- main
main :: IO ()
main = do
  let sample0 = "No comments here"
  let sample1 = "//Hello there!\n//General Kenobi"
  let sample2 = "/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/"
  let sample3 = " //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/\nlet c = 500;"
  let sample4 = "//First\n//Second//NotThird\n//Third"
  let sample5 = "x = 3*4 /* not 3*5 */"
  let sample6 = "/* unterminated comment"
  let sample6 = "/* foo */ /* unterminated comment"
  let sample7 = ""
  let samples = [sample0, sample1, sample2, sample3, sample4, sample5, sample6, sample7]
  mapM_ printHelperF samples

给出输出:

"No comments here"
Right []
"-------------------"
"//Hello there!\n//General Kenobi"
Right ["Hello there!","General Kenobi"]
"-------------------"
"/* What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!*/"
Right [" What's the deal with airline food?\nIt keeps getting worse and worse\nI can't take it anymore!"]
"-------------------"
" //Global Variable\nlet x = 5;\n/*TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n*/\nlet c = 500;"
Right ["Global Variable","TODO:\n\t// Add the number of cats as a variable\n\t//Shouldn't take too long\n"]
"-------------------"
"//First\n//Second//NotThird\n//Third"
Right ["First","Second//NotThird","Third"]
"-------------------"
"x = 3*4 /* not 3*5 */"
Right [" not 3*5 "]
"-------------------"
"/* foo */ /* unterminated comment"
Left "Test Parser" (line 1, column 34):
unexpected end of input
expecting "*/"
"-------------------"
""
Right []
"-------------------"

相关问题