this post was submitted on 12 Jul 2023
2 points (100.0% liked)

Haskell

65 readers
3 users here now

**The Haskell programming language community.** Daily news and info about all things Haskell related: practical stuff, theory, types, libraries, jobs, patches, releases, events and conferences and more... ### Links - Get Started with Haskell

founded 1 year ago
 

How is error information combined when parsers are combined? For example, using <|> to combine parsers, I would expect the set of expected characters for an error to be the union of the sets of expected characters from the individual parsers. (I'm finding it hard to pin down the behaviour of <|> or even to find the relevant source code.)

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

To understand the behavior of megaparsec's <|> operator it is useful to know about "consuming" and "non-consuming" (or "empty") parses. To illustrate that concept I'll compare a literal string parser to a parser that parses each character separately, watch:

> let p = string "abc"
> let q = sequence [char 'a', char 'b', char 'c']
> parseMaybe (p <|> string "abd") "abd")
Just "abd"
> parseMaybe (q <|> string "abd") "abd"
Nothing

So, what happened? Well, when string "abc" tries to parse the string "abd" it fails without consuming any input. Or you can think of it as backtracking back to the beginning of the string. In contrast, the parser sequence [char 'a', char 'b', char 'c'] does consume the 'a' and 'b' characters even if it fails. In this case, <|> will not even try to use the string "abd" parser.

You can manually force the parser to backtrack by using the try function as follows:

> parseMaybe (try q <|> string "abd") "abd"
Just "abd"

But note that this can cause exponential running time, so try to avoid it.

To answer your question given this information: the error information will be combined but only if both arguments of <|> are failing without consuming any input. If either consumes input, then only the error information from that branch is used.