The event model¶
Generated parsers do not materialize a parse tree. They emit a flat sequence of events, and a program that consumes events can reconstruct whatever tree (or no tree at all) it needs. This page specifies the event stream — the contract every backend implements.
The four events¶
Every event is one of:
EnterOpens the subtree of a rule. Carries the rule’s
RuleKindand aPosmarking the start of the subtree (the position of its first child, or the position of its matchingExitfor an empty rule). Only non-fragment rules produce this event — fragment rules are inlined without anyEnter/Exitmarkers.ExitCloses the matching
Enter. Carries the sameRuleKindand aPosmarking the end of the subtree (the position just past the last consumed token of the rule’s content, equal to the enter position if nothing was consumed).TokenA lexed token. Carries:
kind— aTokenKindvalue identifying which token declaration this matches. The reserved sentinelsEOFandERRORare also possible; see below.span— aSpancovering the matched input.text— the matched source text, exactly as it appeared. Un-escaping, numeric conversion, and other transforms are not performed by the parser.
ErrorA recoverable diagnostic. Carries a human-readable message and a
Spanpointing at the offending lookahead. The parser continues emitting events after an error, so a file with many errors still yields a useful stream.
Every backend names these four cases the same way in its idiomatic
tagged-union form — in TypeScript they are {tag: "enter" | "exit" |
"token" | "error", ...}; in Rust they are Event::Enter { .. }
and friends; in Python they are Event objects with a .tag
string attribute; in Go they are distinguished by an EventTag
constant.
Ordering guarantees¶
Source order. Events are emitted in the order their source bytes appear. Skip tokens (see below) are interleaved with structural events accordingly.
Balanced structure. Every
Enteris matched by exactly oneExitfor the sameRuleKind. Errors or recovery do not cause unmatchedEnter/Exitpairs — if the parser commits to a rule, it finishes the rule.Finality. Events are never retracted or reordered. A consumer can commit to a side-effect on each event as it arrives.
Termination. The stream ends when the parser reaches the end of input. If there are trailing bytes after the start rule completes, the parser emits an “expected end of input” error and consumes the remaining tokens before terminating.
Building a tree from events¶
The canonical consumer keeps a stack: push a new node on Enter,
attach tokens as children of the top-of-stack node, and pop on
Exit. In pseudocode:
stack = [root]
for ev in parser:
match ev.tag:
case "enter":
node = make_node(ev.rule)
stack[-1].children.append(node)
stack.push(node)
case "token":
stack[-1].children.append(ev.token)
case "exit":
stack.pop()
case "error":
errors.append(ev.error)
This is the direct, mechanical translation — consumers that want a
typed AST typically switch on ev.rule inside enter to pick
the right node type, and switch on ev.token.kind inside
token to decode the leaf.
Skip tokens¶
Tokens declared with the ? prefix (whitespace, comments) are
skips. The parser’s state machine does not see them — they are
never consumed by Expect or examined by lookahead. The runtime
re-inserts them into the event stream just before the next structural
event, so consumers that want trivia (formatters, highlighters) see
skips in their correct source position, while consumers that only
care about structure can filter them out by kind or by the fact that
they appear outside any rule scope.
The Pos and Span types¶
Every backend exposes the same two shapes:
Pos{offset, line, column}.offsetis a 0-based byte offset into the source.lineis 1-based.columnis 1-based and counted in Unicode codepoints within the line (not bytes, not grapheme clusters).SpanA half-open
[start, end)pair ofPosvalues.span.start == span.enddenotes a zero-width span at a point — used, for example, for theEnterof an empty rule.
Reserved token kinds¶
Two token kinds are reserved and never collide with a grammar token:
EOF(kind id0)Emitted once by the lexer when the input is exhausted. The parser consumes it internally; consumers typically do not see an
EOFtoken, but may see one insideTokenevents during error recovery in pathological cases.ERROR(kind id-1)Emitted by the lexer when no token pattern matches at the current position. The lexer still advances by one codepoint so the parser can keep making progress. You will see an
ERRORtoken in the event stream at the offending position, accompanied by a nearbyErrorevent explaining what was expected.
Error recovery, observably¶
Two things happen on an unexpected token:
An
Errorevent is emitted with a message like"expected X"and a span over the current lookahead.The parser runs recovery — it consumes tokens until the lookahead matches a token in the enclosing rule’s synchronization set (essentially that rule’s
FOLLOWplusEOF), then retries the expectation once. Tokens skipped during recovery are still emitted asTokenevents so consumers do not silently lose input.
This means a parse of a broken file produces a stream where every input byte is accounted for: some as well-formed tokens, some as errors plus the tokens recovery skipped over. An editor or linter consuming the stream can highlight error spans without losing track of the surrounding structure.