aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: b532ea2d881e0b697795803d595a2747905d7d60 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# parcom

A simple parser combinator library with a dumb name.

## WARNING

This library is a work in progress.
Any version of this library <1.0.0 should not be used in production environments.
The library is still growing and breaking changes may occur at any time.

## Description

Parcom is a Crystal library the provides parser combinator functionality.

## Prerequisites

* Git

## Installation

Add the following dependency to your project's `shard.yml` file:
```
dependencies:
  parcom:
    git: "https://git.matthewhall.xyz/parcom"
    version: "0.3.0"
```

Then, run
```
shards install
```

## General usage

Parcom parsers work by creating parser objects, and then calling their `#parse` method with the given input.
As this library use parser combinators, complex parser objects should be made by combining simple parsers together.

## Example walkthrough

Before we get started, it is recommended to `include` the Parcom module in whatever namespace you are working in:

```
require "parcom"

include Parcom

module YourModule
  def self.main
    puts "Hello world!"
  end
end

YourModule.main
```

Suppose we want to parse a `Hash(Int32, Int32)` literal from a string.

First, we should define how to parse a digit:
```
# This defines a parser that will parse a single Char, check if
# it is a digit, and fail if it is not a digit.
d = Parser(Char, Char).satisfy(&.number?)
```

Numbers often have one or more digits [citation needed], so let's make another parser based on `d` that parses multiple digits:
```
# `Parser#some` is a method that creates a new parser that parses
# one or more instances of what the original parser would parse.
abs_num = d.some
```

We're not quite done with this yet, as we want a parser of `Int32`, but this parser will parse an `Array(Char)`.
We need to change the value inside the parser with the `Parser#map` method:
```
# The `Parser#map` method accepts a block or proc that takes the expected
# parser result and transforms it into something else.
# In this case, we're converting our array of digits into an Int32.
abs_num = d.some.map { |ds| ds.join.to_i32 }
```

Now we have a parser that can parse positive integers (in base 10). But what about negative numbers?

First, we make a parser that parses a '-' sign if it can, but doesn't fail if it can't fine one:
```
# `Parser#optional` creates a new parser that tries to parse with the original
# parser, but will return `nil` without consuming any input instead of failing.
sign = Parser.token('-').optional
```

Then we can change the value to `1` or `-1` to multiply by later, based on the result:
```
sign = Parser.token('-').optional.map do |minus_or_nil|
  minus_or_nil.nil? : -1_i32 : 1_i32
end
```

Another way to do this is to use `Parser#recover`, which allows a default value to be specified:
```
# `#map_const` is like `#map`, but it takes a single value to replace
# the parser value with unconditionally.
sign = Parser.token('-')
  .map_const(-1_i32)
  .recover(1_i32)
```

Final code:

TODO: add to practical tests

```
d = Parser(Char, Char).satisfy(&.number?).named("digit")
abs_num = d.some.map { |ds| ds.join.to_i32 }.named("abs_num")

sign = Parser.token('-').map_const(-1_i32).recover(1_i32).named("sign")

int32 = parser_chain Char, Int32, "int32",
  {s, sign},
  {n, abs_num},
  pure: n * s

ws = Parser(Char, Char).satisfy(&.whitespace?)
  .many
  .named("whitespace")

arrow = (ws >> Parser.token_sequence("=>".chars) >> ws).named("arrow")

pair = parser_chain Char, {Int32, Int32}, "pair",
  {x, int32},
  {_, sep},
  {y, int32},
  pure: {x, y}

delim = (ws >> Parser.token(',') >> ws).named("delim")

elements = parser_chain Char, Array({Int32, Int32}), "elements",
  {pairs, pair.sep_by(delim)},
  {_,     delim.optional}, # trailing comma
  pure: pairs

hash_start = (Parser.token('{') >> ws).named("start")
hash_end   = (ws >> Parser.token('}')).named("end")

hash = parser_chain Char, Hash(Int32, Int32), "hash",
  {_,  hash_start},
  {es, elements.recover([] of {Int32, Int32})}
  {_,  hash_end},
  pure: es.to_h
```