README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271

# parcom

A simple parser combinator library with a dumb name.

## WARNING

This library is a work in progress.
Any version of this library <1.0.0 should not be used in production environments.
The library is still growing and breaking changes may occur at any time.

## Description

Parcom is a Crystal library the provides parser combinator functionality.
Users of the Parsec parser library will be familiar with how this library works.

## Prerequisites

* Git (for installation)

## Installation

Add the following dependency to your project's `shard.yml` file:
```
dependencies:
  parcom:
    git: "https://git.matthewhall.xyz/parcom"
    version: "0.3.0"
```

Then, run
```
shards install
```

## General usage

Parcom parsers work by creating parser objects, and then calling their `#parse` method with the given input.
This will either return a `Result` object, or raise a `ParserFail` exception.

### Working with strings

Parcom's parser objects do not (currently) parse from strings.
Rather, they parse from custom-defined `Tokens` objects which wrap sequences of arbitrary token data.
Similar to how a string is thought of as a list of character tokens, Parcom parsers can parse sequences ofany kind of token.

With that being said, strings can be converted to `Tokens` objects with the `.from_string` method:
```
a_string = "foo bar"
tokens = Tokens.from_string(a_string)
```

Likewise, parsers cannot be created in terms of strings (yet), but as arrays of characters:

```
# parses the tokens 'f', 'o', 'o', in that order
p = Parser.token_sequence("foo".chars)

result = p.parse(tokens)
```

## Example walkthrough

Before we get started, it is recommended to `include` the Parcom module in whatever namespace you are working in:

```
require "parcom"

include Parcom

module YourModule
  def self.main
    puts "Hello world!"
  end
end

YourModule.main
```

Suppose we want to parse a `Hash(Int32, Int32)` literal from a string.

First, we should define how to parse a digit:
```
# This defines a parser that will parse a single Char, check if
# it is a digit, and fail if it is not a digit.
d = Parser(Char, Char).satisfy(&.number?)
```

Numbers often have one or more digits [citation needed], so let's make another parser based on `d` that parses multiple digits:
```
# `Parser#some` is a method that creates a new parser that parses
# one or more instances of what the original parser would parse.
abs_num = d.some
```

We're not quite done with this yet, as we want a parser of `Int32`, but this parser will parse an `Array(Char)`.
We need to change the value inside the parser with the `Parser#map` method:
```
# The `Parser#map` method accepts a block or proc that takes the expected
# parser result and transforms it into something else.
# In this case, we're converting our array of digits into an Int32.
abs_num = d.some.map { |ds| ds.join.to_i32 }
```

Now we have a parser that can parse positive integers (in base 10). But what about negative numbers?

First, we make a parser that parses a '-' sign if it can, but doesn't fail if it can't fine one:
```
# `Parser#optional` creates a new parser that tries to parse with the original
# parser, but will return `nil` without consuming any input instead of failing.
sign = Parser.token('-').optional
```

Then we can change the value to `1` or `-1` to multiply by later, based on the result:
```
sign = Parser.token('-').optional.map do |minus_or_nil|
  minus_or_nil.nil? : -1_i32 : 1_i32
end
```

Another way to do this is to use `Parser#recover`, which allows a default value to be specified:
```
# `#map_const` is like `#map`, but it takes a single value to replace
# the parser value with unconditionally.
sign = Parser.token('-')
  .map_const(-1_i32)
  .recover(1_i32)
```

Now that we can parse a number and its sign, all we need to do is parse them together and combine them.
There are multiple ways this can be done, but for now we can just use the `parser_chain` macro:
```
int32 = parser_chain Char, Int32, "int32",
  {s, sign},
  {n, abs_num},
  pure: n * s
```

The `parser_chain` macro is given the types needed to generate the parser, as well as a name.
Next, it receives tuples with some identifier `x` and some other parser `p`.
The final parser will run each of the parsers in order to `x`, to be accessed later.
These values can be used by other parsers in the chain, or even to define them.

Finally, we have the named argument `pure`, which indicates we just want to compute some value to return at the end.
In our case, we multiply the results of `sign` and `abs_num` to get the final integer value.

The next step is to parse two numbers as a key-value pair.
The format of such a pair is a number, followed by optional whitespace, the `=>` symbol, more optional whitespace, then another number.

We already know how to parse the numbers, so let's try to parse the whitespace:

```
# `#many` is similar to `#some`, but allows matching zero times
ws = Parser(Char, Char).satisfy(&.whitespace?).many
```

The `=>` symbol is also easy enough to parse, since we know exactly what to look for:

```
# `Parser.token_sequence` accepts an array of tokens and only succeeds if
# the input starts with those same tokens.
arrow = Parser.token_sequence("=>".chars)
```

Sine the arrow and the optional whitespace are all expected together, let's re-define that last parser to include the whitespace:

```
arrow = (ws >> Parser.token_sequence("=>".chars) >> ws).named("arrow")
```

Here we see another option for combining parsers: the `>>` operator.
Parser objects have special overloads for `+`, `>>`, and `<<`.
These can be used to quickly sequence multiple parsers together in situations where we don't really care what the actual results of each parser are.
In this case, `>>` will parse the left parser, then the right parser, keeping only the right parser's result.

Now that we have our pair separator defined, we can parse a pair of numbers:
```
pair = parser_chain Char, {Int32, Int32}, "pair",
  {x, int32},
  {_, arrow},
  {y, int32},
  pure: {x, y}
```

Here, we parse a number to be stored in `x`, an arrow separator which is then discarded (bound to `_`), and then one more number to be stored in `y`.
At the end we then put the two numbers in a tuple and return them.

Not much further now!
Now we need to be able to parse zero or more pairs separated by commas.

The comma parser should look familiar:
```
# We're only looking for one token this time, so `Parser.token` is enough here
delim = (ws >> Parser.token(',') >> ws).named("delim")
```

But we'll want to employ a new method for all the elements together:
```
elements = pair.sep_by(delim).named("elements")
```

`#sep_by` is a bit like `#some`, because it will parse one or more instances of something.
The key difference is that it will also parse an instance of something else between.

The current implementation would work, but it doesn't support trailing commas.
That's an easy enough fix with the `<<` operator:
```
# `<<` will parse two things, and keep only the value of the first parser.
elements = (pair.sep_by(delim) << delim.optional).named("elements")
```

Now we just have to define parsers for the beginning and ending parts of the hash...
```
hash_start = (Parser.token('{') >> ws).named("start")
hash_end   = (ws >> Parser.token('}')).named("end")
```

...and we have everything we need!

Putting it all together:
```
hash = parser_chain Char, Hash(Int32, Int32), "hash",
  {_,  hash_start},
  {es, elements.recover([] of {Int32, Int32})}
  {_,  hash_end},
  pure: es.to_h
```

You may have noticed in the earlier definition of `elements` that it would not be able to parse zero pairs of numbers.
We have addressed that here with a call to `#recover`, supplying an empty array.

Similar to the `int32` parser, we convert the array of tuples into a hash in the `pure` argument.

Final code:

```
d = Parser(Char, Char).satisfy(&.number?).named("digit")
abs_num = d.some.map { |ds| ds.join.to_i32 }.named("abs_num")

sign = Parser.token('-').map_const(-1_i32).recover(1_i32).named("sign")

int32 = parser_chain Char, Int32, "int32",
  {s, sign},
  {n, abs_num},
  pure: n * s

ws = Parser(Char, Char).satisfy(&.whitespace?)
  .many
  .named("whitespace")

arrow = (ws >> Parser.token_sequence("=>".chars) >> ws).named("arrow")

pair = parser_chain Char, {Int32, Int32}, "pair",
  {x, int32},
  {_, arrow},
  {y, int32},
  pure: {x, y}

delim = (ws >> Parser.token(',') >> ws).named("delim")

elements = (pair.sep_by(delim) << delim.optional).named("elements")

hash_start = (Parser.token('{') >> ws).named("start")
hash_end   = (ws >> Parser.token('}')).named("end")

hash = parser_chain Char, Hash(Int32, Int32), "hash",
  {_,  hash_start},
  {es, elements.recover([] of {Int32, Int32})},
  {_,  hash_end},
  pure: es.to_h
```