1<div align="center">
2 <img alt="reghex" width="250" src="docs/reghex-logo.png" />
3 <br />
4 <br />
5 <strong>
6 The magical sticky regex-based parser generator
7 </strong>
8 <br />
9 <br />
10 <br />
11</div>
12
13Leveraging the power of sticky regexes and Babel code generation, `reghex` allows
14you to code parsers quickly, by surrounding regular expressions with a regex-like
15[DSL](https://en.wikipedia.org/wiki/Domain-specific_language).
16
17With `reghex` you can generate a parser from a tagged template literal, which is
18quick to prototype and generates reasonably compact and performant code.
19
20_This project is still in its early stages and is experimental. Its API may still
21change and some issues may need to be ironed out._
22
23## Quick Start
24
25##### 1. Install with yarn or npm
26
27```sh
28yarn add reghex
29# or
30npm install --save reghex
31```
32
33##### 2. Add the plugin to your Babel configuration (`.babelrc`, `babel.config.js`, or `package.json:babel`)
34
35```json
36{
37 "plugins": ["reghex/babel"]
38}
39```
40
41Alternatively, you can set up [`babel-plugin-macros`](https://github.com/kentcdodds/babel-plugin-macros) and
42import `reghex` from `"reghex/macro"` instead.
43
44##### 3. Have fun writing parsers!
45
46```js
47import match, { parse } from 'reghex';
48
49const name = match('name')`
50 ${/\w+/}
51`;
52
53parse(name)('hello');
54// [ "hello", .tag = "name" ]
55```
56
57## Concepts
58
59The fundamental concept of `reghex` are regexes, specifically
60[sticky regexes](https://www.loganfranken.com/blog/831/es6-everyday-sticky-regex-matches/)!
61These are regular expressions that don't search a target string, but instead match at the
62specific position they're at. The flag for sticky regexes is `y` and hence
63they can be created using `/phrase/y` or `new RegExp('phrase', 'y')`.
64
65**Sticky Regexes** are the perfect foundation for a parsing framework in JavaScript!
66Because they only match at a single position they can be used to match patterns
67continuously, as a parser would. Like global regexes, we can then manipulate where
68they should be matched by setting `regex.lastIndex = index;` and after matching
69read back their updated `regex.lastIndex`.
70
71> **Note:** Sticky Regexes aren't natively
72> [supported in all versions of Internet Explorer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky#Browser_compatibility). `reghex` works around this by imitating its behaviour, which may decrease performance on IE11.
73
74This primitive allows us to build up a parser from regexes that you pass when
75authoring a parser function, also called a "matcher" in `reghex`. When `reghex` compiles
76to parser code, this code is just a sequence and combination of sticky regexes that
77are executed in order!
78
79```js
80let input = 'phrases should be parsed...';
81let lastIndex = 0;
82
83const regex = /phrase/y;
84function matcher() {
85 let match;
86 // Before matching we set the current index on the RegExp
87 regex.lastIndex = lastIndex;
88 // Then we match and store the result
89 if ((match = regex.exec(input))) {
90 // If the RegExp matches successfully, we update our lastIndex
91 lastIndex = regex.lastIndex;
92 }
93}
94```
95
96This mechanism is used in all matcher functions that `reghex` generates.
97Internally `reghex` keeps track of the input string and the current index on
98that string, and the matcher functions execute regexes against this state.
99
100## Authoring Guide
101
102You can write "matchers" by importing the default import from `reghex` and
103using it to write a matcher expression.
104
105```js
106import match from 'reghex';
107
108const name = match('name')`
109 ${/\w+/}
110`;
111```
112
113As can be seen above, the `match` function, which is what we've called the
114default import, is called with a "node name" and is then called as a tagged
115template. This template is our **parsing definition**.
116
117`reghex` functions only with its Babel plugin, which will detect `match('name')`
118and replace the entire tag with a parsing function, which may then look like
119the following in your transpiled code:
120
121```js
122import { _pattern /* ... */ } from 'reghex';
123
124var _name_expression = _pattern(/\w+/);
125var name = function name() {
126 /* ... */
127};
128```
129
130We've now successfully created a matcher, which matches a single regex, which
131is a pattern of one or more letters. We can execute this matcher by calling
132it with the curried `parse` utility:
133
134```js
135import { parse } from 'reghex';
136
137const result = parse(name)('Tim');
138
139console.log(result); // [ "Tim", .tag = "name" ]
140console.log(result.tag); // "name"
141```
142
143If the string (Here: "Tim") was parsed successfully by the matcher, it will
144return an array that contains the result of the regex. The array is special
145in that it will also have a `tag` property set to the matcher's name, here
146`"name"`, which we determined when we defined the matcher as `match('name')`.
147
148```js
149import { parse } from 'reghex';
150parse(name)('42'); // undefined
151```
152
153Similarly, if the matcher does not parse an input string successfully, it will
154return `undefined` instead.
155
156### Nested matchers
157
158This on its own is nice, but a parser must be able to traverse a string and
159turn it into an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
160To introduce nesting to `reghex` matchers, we can refer to one matcher in another!
161Let's extend our original example;
162
163```js
164import match from 'reghex';
165
166const name = match('name')`
167 ${/\w+/}
168`;
169
170const hello = match('hello')`
171 ${/hello /} ${name}
172`;
173```
174
175The new `hello` matcher is set to match `/hello /` and then attempts to match
176the `name` matcher afterwards. If either of these matchers fail, it will return
177`undefined` as well and roll back its changes. Using this matcher will give us
178**nested abstract output**.
179
180We can also see in this example that _outside_ of the regex interpolations,
181whitespaces and newlines don't matter.
182
183```js
184import { parse } from 'reghex';
185
186parse(hello)('hello tim');
187/*
188 [
189 "hello",
190 ["tim", .tag = "name"],
191 .tag = "hello"
192 ]
193*/
194```
195
196### Regex-like DSL
197
198We've seen in the previous examples that matchers are authored using tagged
199template literals, where interpolations can either be filled using regexes,
200`${/pattern/}`, or with other matchers `${name}`.
201
202The tagged template syntax supports more ways to match these interpolations,
203using a regex-like Domain Specific Language. Unlike in regexes, whitespaces
204and newlines don't matter to make it easier to format and read matchers.
205
206We can create **sequences** of matchers by adding multiple expressions in
207a row. A matcher using `${/1/} ${/2/}` will attempt to match `1` and then `2`
208in the parsed string. This is just one feature of the regex-like DSL. The
209available operators are the following:
210
211| Operator | Example | Description |
212| -------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
213| `?` | `${/1/}?` | An **optional** may be used to make an interpolation optional. This will mean that the interpolation may or may not match. |
214| `*` | `${/1/}*` | A **star** can be used to match an arbitrary amount of interpolation or none at all. This will mean that the interpolation may repeat itself or may not be matched at all. |
215| `+` | `${/1/}+` | A **plus** is used like `*` and must match one or more times. When the matcher doesn't match, that's considered a failing case, since the match isn't optional. |
216| `\|` | `${/1/} \| ${/2/}` | An **alternation** can be used to match either one thing or another, falling back when the first interpolation fails. |
217| `()` | `(${/1/} ${/2/})+` | A **group** can be used apply one of the other operators to an entire group of interpolations. |
218| `(?: )` | `(?: ${/1/})` | A **non-capturing group** is like a regular group, but whatever the interpolations inside it will match, won't appear in the parser's output. |
219| `(?= )` | `(?= ${/1/})` | A **positive lookahead** will check whether interpolations match, and if so will continue the matcher without changing the input. If it matches it's essentially ignored. |
220| `(?! )` | `(?! ${/1/})` | A **negative lookahead** will check whether interpolations _don't_ match, and if so will continue the matcher without changing the input. If the interpolations do match the mathcer will be aborted. |
221
222We can combine and compose these operators to create more complex matchers.
223For instance, we can extend the original example to only allow a specific set
224of names by using the `|` operator:
225
226```js
227const name = match('name')`
228 ${/tim/} | ${/tom/} | ${/tam/}
229`;
230
231parse(name)('tim'); // [ "tim", .tag = "name" ]
232parse(name)('tom'); // [ "tom", .tag = "name" ]
233parse(name)('patrick'); // undefined
234```
235
236The above will now only match specific name strings. When one pattern in this
237chain of **alternations** does not match, it will try the next one.
238
239We can also use **groups** to add more matchers around the alternations themselves,
240by surrounding the alternations with `(` and `)`
241
242```js
243const name = match('name')`
244 (${/tim/} | ${/tom/}) ${/!/}
245`;
246
247parse(name)('tim!'); // [ "tim", "!", .tag = "name" ]
248parse(name)('tom!'); // [ "tom", "!", .tag = "name" ]
249parse(name)('tim'); // undefined
250```
251
252Maybe we're also not that interested in the `"!"` showing up in the output node.
253If we want to get rid of it, we can use a **non-capturing group** to hide it,
254while still requiring it.
255
256```js
257const name = match('name')`
258 (${/tim/} | ${/tom/}) (?: ${/!/})
259`;
260
261parse(name)('tim!'); // [ "tim", .tag = "name" ]
262parse(name)('tim'); // undefined
263```
264
265Lastly, like with regexex `?`, `*`, and `+` may be used as "quantifiers". The first two
266may also be optional and _not_ match their patterns without the matcher failing.
267The `+` operator is used to match an interpolation _one or more_ times, while the
268`*` operators may match _zero or more_ times. Let's use this to allow the `"!"`
269to repeat.
270
271```js
272const name = match('name')`
273 (${/tim/} | ${/tom/})+ (?: ${/!/})*
274`;
275
276parse(name)('tim!'); // [ "tim", .tag = "name" ]
277parse(name)('tim!!!!'); // [ "tim", .tag = "name" ]
278parse(name)('tim'); // [ "tim", .tag = "name" ]
279parse(name)('timtim'); // [ "tim", tim", .tag = "name" ]
280```
281
282As we can see from the above, like in regexes, quantifiers can be combined with groups,
283non-capturing groups, or other groups.
284
285### Transforming as we match
286
287In the previous sections, we've seen that the **nodes** that `reghex` outputs are arrays containing
288match strings or other nodes and have a special `tag` property with the node's type.
289We can **change this output** while we're parsing by passing a second function to our matcher definition.
290
291```js
292const name = match('name', (x) => x[0])`
293 (${/tim/} | ${/tom/}) ${/!/}
294`;
295
296parse(name)('tim'); // "tim"
297```
298
299In the above example, we're passing a small function, `x => x[0]` to the matcher as a
300second argument. This will change the matcher's output, which causes the parser to
301now return a new output for this matcher.
302
303We can use this function creatively by outputting full AST nodes, maybe like the
304ones even that resemble Babel's output:
305
306```js
307const identifier = match('identifier', (x) => ({
308 type: 'Identifier',
309 name: x[0],
310}))`
311 ${/[\w_][\w\d_]+/}
312`;
313
314parse(name)('var_name'); // { type: "Identifier", name: "var_name" }
315```
316
317We've now entirely changed the output of the parser for this matcher. Given that each
318matcher can change its output, we're free to change the parser's output entirely.
319By **returning a falsy** in this matcher, we can also change the matcher to not have
320matched, which would cause other matchers to treat it like a mismatch!
321
322```js
323import match, { parse } from 'reghex';
324
325const name = match('name')((x) => {
326 return x[0] !== 'tim' ? x : undefined;
327})`
328 ${/\w+/}
329`;
330
331const hello = match('hello')`
332 ${/hello /} ${name}
333`;
334
335parse(name)('tom'); // ["hello", ["tom", .tag = "name"], .tag = "hello"]
336parse(name)('tim'); // undefined
337```
338
339Lastly, if we need to create these special array nodes ourselves, we can use `reghex`'s
340`tag` export for this purpose.
341
342```js
343import { tag } from 'reghex';
344
345tag(['test'], 'node_name');
346// ["test", .tag = "node_name"]
347```
348
349**That's it! May the RegExp be ever in your favor.**