diff options
| author | Matthew Hall <hallmatthew314@gmail.com> | 2023-04-04 22:26:23 +1200 |
|---|---|---|
| committer | Matthew Hall <hallmatthew314@gmail.com> | 2023-04-04 22:26:23 +1200 |
| commit | c98c2b5dba76a64f51fc4dfba279cd616e3338ad (patch) | |
| tree | 6fb86f638982dd76dd8114a404466ebcc8ded33d /README.md | |
| parent | 1888a8b5ea6316724bef2ca7a3d7b5d61707bd96 (diff) | |
Finish example walkthrough
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 115 |
1 files changed, 107 insertions, 8 deletions
@@ -104,9 +104,111 @@ sign = Parser.token('-') .recover(1_i32) ``` -Final code: +Now that we can parse a number and its sign, all we need to do is parse them together and combine them. +There are multiple ways this can be done, but for now we can just use the `parser_chain` macro: +``` +int32 = parser_chain Char, Int32, "int32", + {s, sign}, + {n, abs_num}, + pure: n * s +``` + +The `parser_chain` macro is given the types needed to generate the parser, as well as a name. +Next, it receives tuples with some identifier `x` and some other parser `p`. +The final parser will run each of the parsers in order to `x`, to be accessed later. +These values can be used by other parsers in the chain, or even to define them. + +Finally, we have the named argument `pure`, which indicates we just want to compute some value to return at the end. +In our case, we multiply the results of `sign` and `abs_num` to get the final integer value. + +The next step is to parse two numbers as a key-value pair. +The format of such a pair is a number, followed by optional whitespace, the `=>` symbol, more optional whitespace, then another number. + +We already know how to parse the numbers, so let's try to parse the whitespace: + +``` +# `#many` is similar to `#some`, but allows matching zero times +ws = Parser(Char, Char).satisfy(&.whitespace?).many +``` + +The `=>` symbol is also easy enough to parse, since we know exactly what to look for: + +``` +# `Parser.token_sequence` accepts an array of tokens and only succeeds if +# the input starts with those same tokens. +arrow = Parser.token_sequence("=>".chars) +``` + +Sine the arrow and the optional whitespace are all expected together, let's re-define that last parser to include the whitespace: + +``` +arrow = (ws >> Parser.token_sequence("=>".chars) >> ws).named("arrow") +``` + +Here we see another option for combining parsers: the `>>` operator. +Parser objects have special overloads for `+`, `>>`, and `<<`. +These can be used to quickly sequence multiple parsers together in situations where we don't really care what the actual results of each parser are. +In this case, `>>` will parse the left parser, then the right parser, keeping only the right parser's result. + +Now that we have our pair separator defined, we can parse a pair of numbers: +``` +pair = parser_chain Char, {Int32, Int32}, "pair", + {x, int32}, + {_, arrow}, + {y, int32}, + pure: {x, y} +``` -TODO: add to practical tests +Here, we parse a number to be stored in `x`, an arrow separator which is then discarded (bound to `_`), and then one more number to be stored in `y`. +At the end we then put the two numbers in a tuple and return them. + +Not much further now! +Now we need to be able to parse zero or more pairs separated by commas. + +The comma parser should look familiar: +``` +# We're only looking for one token this time, so `Parser.token` is enough here +delim = (ws >> Parser.token(',') >> ws).named("delim") +``` + +But we'll want to employ a new method for all the elements together: +``` +elements = pair.sep_by(delim).named("elements") +``` + +`#sep_by` is a bit like `#some`, because it will parse one or more instances of something. +The key difference is that it will also parse an instance of something else between. + +The current implementation would work, but it doesn't support trailing commas. +That's an easy enough fix with the `<<` operator: +``` +# `<<` will parse two things, and keep only the value of the first parser. +elements = (pair.sep_by(delim) << delim.optional).named("elements") +``` + +Now we just have to define parsers for the beginning and ending parts of the hash... +``` +hash_start = (Parser.token('{') >> ws).named("start") +hash_end = (ws >> Parser.token('}')).named("end") +``` + +...and we have everything we need! + +Putting it all together: +``` +hash = parser_chain Char, Hash(Int32, Int32), "hash", + {_, hash_start}, + {es, elements.recover([] of {Int32, Int32})} + {_, hash_end}, + pure: es.to_h +``` + +You may have noticed in the earlier definition of `elements` that it would not be able to parse zero pairs of numbers. +We have addressed that here with a call to `#recover`, supplying an empty array. + +Similar to the `int32` parser, we convert the array of tuples into a hash in the `pure` argument. + +Final code: ``` d = Parser(Char, Char).satisfy(&.number?).named("digit") @@ -127,23 +229,20 @@ arrow = (ws >> Parser.token_sequence("=>".chars) >> ws).named("arrow") pair = parser_chain Char, {Int32, Int32}, "pair", {x, int32}, - {_, sep}, + {_, arrow}, {y, int32}, pure: {x, y} delim = (ws >> Parser.token(',') >> ws).named("delim") -elements = parser_chain Char, Array({Int32, Int32}), "elements", - {pairs, pair.sep_by(delim)}, - {_, delim.optional}, # trailing comma - pure: pairs +elements = (pair.sep_by(delim) << delim.optional).named("elements") hash_start = (Parser.token('{') >> ws).named("start") hash_end = (ws >> Parser.token('}')).named("end") hash = parser_chain Char, Hash(Int32, Int32), "hash", {_, hash_start}, - {es, elements.recover([] of {Int32, Int32})} + {es, elements.recover([] of {Int32, Int32})}, {_, hash_end}, pure: es.to_h ``` |
