The event model¶
Generated parsers do not materialize a parse tree. They emit a flat sequence of events, and a program that consumes events can reconstruct whatever tree (or no tree at all) it needs. This page specifies the event stream — the contract every backend implements.
The five events¶
Every event is one of:
EnterOpens the subtree of a rule. Carries the rule’s
RuleKindand aPosmarking the start of the subtree (the position of its first child, or the position of its matchingExitfor an empty rule). Only non-fragment rules produce this event — fragment rules are inlined without anyEnter/Exitmarkers.ExitCloses the matching
Enter. Carries the sameRuleKindand aPosmarking the end of the subtree (the position just past the last consumed token of the rule’s content, equal to the enter position if nothing was consumed).TokenA lexed token consumed from the input. Carries:
kind— aTokenKindvalue identifying which token declaration this matches, or a nullable / sentinel value when the lexer failed to match at the current position; see below.span— aSpancovering the matched input.text— the matched source text, exactly as it appeared. Un-escaping, numeric conversion, and other transforms are not performed by the parser.
Tokenevents always carry legitimate parse data, including the “synced-to-expected” token after a recovery (when anexpectmismatched and the recovery’s sync set landed on the kind it was expecting).GarbageA token consumed by error recovery — emitted between an
Errorand the recovery’s sync point. Carries the same payload shape asToken(kind, span, text), but is distinct so consumers can drop these from their AST or render them as error spans without tracking recovery state externally.ErrorA recoverable diagnostic. Carries a human-readable message and a
Spanpointing at the offending lookahead. The parser continues emitting events after an error, so a file with many errors still yields a useful stream.
Every backend names these five cases the same way in its idiomatic
tagged-union form — in TypeScript they are {tag: "enter" | "exit"
| "token" | "garbage" | "error", ...}; in Rust they are
Event::Enter { .. } / Event::Exit { .. } / Event::Token(..)
/ Event::Garbage(..) / Event::Error(..); in Python they are
Event objects with a .tag string attribute; in Go they are
distinguished by an EventTag constant (EvEnter, EvExit,
EvToken, EvGarbage, EvError); in C# they are sealed
records (EnterEvent, ExitEvent, TokenEvent,
GarbageEvent, ErrorEvent); in Java they are sealed
sub-classes of Event; in C they are EventTag constants
(EV_ENTER, EV_EXIT, EV_TOKEN, EV_GARBAGE, EV_ERROR).
Ordering guarantees¶
Source order. Events are emitted in the order their source bytes appear. Skip tokens (see below) are interleaved with structural events accordingly.
Balanced structure. Every
Enteris matched by exactly oneExitfor the sameRuleKind. Errors or recovery do not cause unmatchedEnter/Exitpairs — if the parser commits to a rule, it finishes the rule.Finality. Events are never retracted or reordered. A consumer can commit to a side-effect on each event as it arrives.
Termination. The stream ends when the parser reaches the end of input. If there are trailing bytes after the start rule completes, the parser emits an “expected end of input” error and consumes the remaining tokens (as
Garbage) before terminating.
Building a tree from events¶
The canonical consumer keeps a stack: push a new node on Enter,
attach tokens as children of the top-of-stack node, and pop on
Exit. In pseudocode:
stack = [root]
for ev in parser:
match ev.tag:
case "enter":
node = make_node(ev.rule)
stack[-1].children.append(node)
stack.push(node)
case "token":
stack[-1].children.append(ev.token)
case "exit":
stack.pop()
case "garbage":
# token consumed by recovery — typically dropped, or
# collected on the side as an error span
continue
case "error":
errors.append(ev.error)
This is the direct, mechanical translation — consumers that want a
typed AST typically switch on ev.rule inside enter to pick
the right node type, and switch on ev.token.kind inside
token to decode the leaf.
Skip tokens¶
Tokens declared with the -> skip action (whitespace, comments)
are skips. The parser’s state machine does not see them — they are
never consumed by Expect or examined by lookahead. The runtime
re-inserts them into the event stream just before the next structural
event, so consumers that want trivia (formatters, highlighters) see
skips in their correct source position by default.
Consumers that don’t want skips can opt into drop-skips mode at
parser construction (a compile-time ParserConfig in Rust, a
runtime Options flag in the other backends — see Using parsuna).
With drop-skips on, the lexer still matches skip tokens (they
delimit structural ones), but the parser silently consumes them
instead of yielding them as Token events. The structural event
stream is unchanged either way.
The Pos and Span types¶
Every backend exposes the same two shapes:
Pos{offset, line, column}.offsetis a 0-based byte offset into the source.lineis 1-based.columnis 1-based and counted in Unicode codepoints within the line (not bytes, not grapheme clusters).SpanA half-open
[start, end)pair ofPosvalues.span.start == span.enddenotes a zero-width span at a point — used, for example, for theEnterof an empty rule.
EOF and lex failures¶
EOF is reserved as a token-kind name — a grammar that declares a
token called EOF is rejected. ERROR is not reserved: you
can declare a token called ERROR if you like.
EOF(kind id0)Emitted once by the lexer when the input is exhausted. The parser consumes it internally; consumers typically do not see an
EOFtoken, but may see one insideTokenevents during error recovery in pathological cases.
Lex failures (no token pattern matches at the current position) are
not represented as a separate token kind in the grammar’s enum. The
lexer emits a normal Token event covering one codepoint with the
kind field set to the language’s “no kind” value — None in
Rust, null in TypeScript, Optional[int] None in Python,
and the unsigned sentinel 0xFFFF in Go, Java, C#, and C — so the
parser can surface an error and keep making progress. The offending
position will also produce a nearby Error event explaining what
was expected.
Error recovery, observably¶
Whenever the parser hits a missing-token site — a sequence Expect
that mismatches, or an alternative Dispatch whose lookahead
matches no arm — it picks between two strategies by the same rule:
if the current lookahead is already a valid continuation past the
missing token, treat the missing token as inserted; otherwise,
delete tokens until the lookahead lands on something that resyncs
the surrounding rule.
Insertion recovery — the parser treats the missing token as if it had been silently inserted:
An
Errorevent is emitted with a message like"expected `>`"and a zero-width span at the current lookahead (the message quotes the literal in backticks for simple tokens, or uses the grammar-declared name likeIDENTfor pattern-class tokens).Drive resumes just past the synthetic missing token, with the lookahead untouched. The surrounding rule keeps making progress.
For a single-arm site (Expect) the trigger is “lookahead is in
the rule’s SYNC set” (= FOLLOW + EOF). For a multi-arm site
(Dispatch) each arm carries its own per-arm continuation FIRST,
and the parser commits to the first arm whose set contains the
lookahead. Both paths route through the same model: missing-token
errors leave the lookahead in place; only deletion produces
Garbage.
This is the path taken for a missing structural delimiter (e.g. the
> of an XML start-tag, the ; ending a declaration) where the
next token is part of well-formed input that should not be consumed
as garbage.
Deletion recovery — when the lookahead isn’t a valid continuation anywhere, the parser falls back on synchronization:
An
Errorevent is emitted with a message like"expected X"or"unexpected token"and a span over the current lookahead.The parser switches into recovery mode — it consumes tokens until the lookahead matches a token in the enclosing rule’s synchronization set (essentially that rule’s
FOLLOWplusEOF). Each token consumed during recovery comes through as aGarbageevent, one per call to the iterator, so consumers stay in lock-step with input even on long error runs.Once the lookahead lands on a sync token, recovery finalises. If the synced token happens to be the kind the rule was expecting, it comes through as a normal
Tokenevent (because it is legitimate parse data). Otherwise recovery just clears the armed state and the rule’s surrounding flow resumes — the sync token is not consumed; the next iteration sees it via the regular structural events.
Either way, a parse of a broken file produces a stream where
every input byte is accounted for: some as well-formed Token
events, some as Garbage followed by Token once recovery
synced, each error position carrying its own Error event,
and each missing-token site carrying an Error with no
matching Token/Garbage (the recovery was an insertion).
An editor or linter consuming the stream can highlight error
spans without losing track of the surrounding structure.