haskell 兆秒差距:跳过空格和非字母数字

hk8txs48  于 2022-11-14  发布在  其他
关注(0)|答案(2)|浏览(168)

一般来说,我是Megaparsec和Haskell的初学者,正在尝试为以下语法编写一个解析器:
一个词将始终是以下之一:
1.由一个或多个ASCII数字组成的数字(即“0”或“1234”),或
1.由一个或多个ASCII字母组成的简单单词(即“a”或“they”),或
1.由两个简单单词通过一个撇号连接而成的缩写形式(即“it's”或“they're”)
到目前为止,我已经得到了以下内容(这可能可以简化):

data Word = Number String | SimpleWord String | Contraction String deriving (Show)

word :: Parser MyParser.Word
word = M.choice
  [ Number <$> number
  , Contraction <$> contraction
  , SimpleWord <$> simpleWord
  ]

number :: Parser String
number = M.some C.numberChar

simpleWord :: Parser String
simpleWord = M.some C.letterChar

contraction :: Parser String
contraction = do
  left <- simpleWord
  void $ C.char '\''
  right <- simpleWord
  return (left ++ "'" ++ right)

但是我在定义一个解析器来跳过白色和任何非字母数字字符时遇到了问题。例如,给定输入'abc',解析器应该丢弃撇号而只接受“simple word”。下面的代码不能编译:

filler :: Parser Char
filler = M.some (C.spaceChar  A.<|> not C.alphaNumChar)

spaceConsumer :: Parser ()
spaceConsumer = L.space filler A.empty A.empty

lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer
r7s23pms

r7s23pms1#

下面是我想出的完整的工作代码。

type Parser =
  M.Parsec
    -- The type for custom error messages. We have none, so use `Void`.
    Void
    -- The input stream type. Let's use `String` for now.
    String
data Word = Number String | SimpleWord String | Contraction String deriving (Eq)
instance Show WordCount.Word where
  show (Number x) = x
  show (SimpleWord x) = x
  show (Contraction x) = x
words :: String -> Either String [String]
-- Force parser to consume entire input
-- <* Sequence actions, discarding the value of the second argument.
words input = case M.parse (M.some WordCount.word A.<* M.eof) "" input of
  -- :t err = M.ParseErrorBundle String Void
  Left err ->
    let e = M.errorBundlePretty err
        _ = putStr e
     in Left e
  Right (x) -> Right $ map (show) x
word :: Parser WordCount.Word
word =
  M.skipManyTill filler $
    lexeme $
      M.choice
        -- <$> is infix for 'fmap'
        [ Number <$> number,
          Contraction <$> M.try contraction,
          SimpleWord <$> simpleWord
        ]
number :: Parser String
number = M.some MC.numberChar
simpleWord :: Parser String
simpleWord = M.some MC.letterChar
contraction :: Parser String
contraction = do
  left <- simpleWord
  void $ MC.char '\''
  right <- simpleWord
  return $ left ++ "'" ++ right
-- Define separator characters
isSep :: Char -> Bool
isSep x = C.isSpace x || (not . C.isAlphaNum) x
-- Fillers fill the space between tokens
filler :: Parser ()
filler = void $ M.some $ M.satisfy isSep
-- 3rd and 4th arguments are for ignoring comments
spaceConsumer :: Parser ()
spaceConsumer = L.space filler A.empty A.empty
-- A parser that discards trailing space
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer
bwitn5fc

bwitn5fc2#

首先,您可能希望将some1用于数字和简单的单词,否则“”将是数字。
你的填充解析器很好,应该使用some,因为你想让“they1234”解析为SimpleWord "they"Number "1234"
对于整个解析器,您需要说明的是,您的文本由零个或多个单词组成,这些单词之间用filler分隔,在单词的前后有可选的填充符。幸运的是,megaparsecControl.Monad.Combinators重新导出了很多有用的东西来实现这一点。
因此,我们可以使用sepBy表示由填充符分隔的单词:

document :: Parser [Word]
document = do
   _ <- filler   -- Throw away any filler at the start.
   result <- word `sepBy` filler
   _ <- filler   -- Throw away any filler at the end.
   return result

对于开始和结束填充符,我们不需要optional,因为填充符可以是零长度。
最后,一个风格要点:在一个真实的的解析器中,你可能会想让Word类型更复杂一些,比如:

data SimpleWord = Number String | SimpleWord String

data Word = Word SimpleWord | Contraction SimpleWord SimpleWord

这样一来,处理Contraction下游的代码就不必重新查找撇号,也不必处理“不可能”的情况,因为没有撇号。一旦在输入中找到了结构信息,就不要丢弃它。但这是本练习的一个附带问题。

相关问题