Pass 3 — Lowering¶
Entry point: lowering::lower (src/lowering/mod.rs).
Output: a StateTable — the flat, backend-agnostic representation
of both the parser and the lexer.
Lowering produces two things: the lexer DFA (what the runtime consults to turn bytes into tokens) and the parser state machine (what consumes those tokens). One DFA per declared lexer mode, one state machine for the whole grammar.
The state machine is built by four sub-passes — build, layout,
optimize, validate — plus a small mark-references post-pass
that flags which FIRST tables backends actually need to emit. The
layout sub-pass also triggers DFA compilation as part of laying out
the final StateTable.
The StateTable¶
The artifact handed to code generation holds:
grammar_name— copied from theGrammar, used by backends for file and package names.tokens: Vec<TokenInfo>— non-fragment tokens, each with its resolved pattern (fragments inlined), itsskipflag, a stable numeric kind id (1-based;0is EOF), and lexer-mode metadata (mode_idfor which mode the token lives in,mode_actionsfor the-> push/-> popactions to fire on a match). Lex failures are surfaced per-language:Option<TK>(None) for Rust / TypeScript / Python, and the unsigned sentinel0xFFFFfor C / Java / C# / Go. Either way, no grammar-declared kind id collides with the failure marker.rule_kinds: Vec<String>— names of the non-fragment rules in declaration order. A rule’sRuleKindid is its index here.first_sets: Vec<FirstSet>— the interned FIRST-set pool. Each entry is{ id, seqs: Vec<Vec<u16>>, has_references: bool }: a set of token-id sequences plus a flag set during a post-optimize pass to mark the entries the generated runtime will actually reference (see Sub-pass 3d below). Backends emit a constant for an entry only when the flag is set.sync_sets: Vec<SyncSet>— the interned SYNC-set pool. Each entry is{ id, kinds: Vec<u16> }: a flat list of token ids.states: BTreeMap<StateId, State>— every parser state, keyed by id. EachStateholds a label and aBody(a sequence ofInstrplus a terminatingTail).entry_states: Vec<(String, StateId)>— public entry points, one per non-fragment rule.k— the grammar’s LL(k).modes: Vec<ModeInfo>— one entry per declared lexer mode.modes[0]is always the default (anonymous) mode; further entries come from@mode(name)pre-annotations in declaration order. EachModeInfocarries the mode’s id, name, and compiled DFA. A grammar without modes has a single-entry vec whose DFA matches every token.StateTable::lexer_dfa()is a shorthand formodes[0].dfa.
EOF’s id is the constant parsuna_rt::TOKEN_EOF (= 0); it is
not stored on the state table because every backend can read the
constant directly. TERMINATED (u32::MAX) is the sentinel
state id meaning “the parser has finished” — also not stored on the
table.
The lexer DFAs¶
File: src/lowering/lexer_dfa.rs.
At runtime the lexer turns the source bytes into a stream of tokens
before the parser state machine ever runs, so it is the natural place
to start. Each declared lexer mode gets its own DFA, compiled from the
list of non-fragment tokens whose mode_id matches — with fragment
references resolved by the build phase beforehand — using standard
Thompson construction plus subset determinization:
NFA construction. Each token’s
TokenPatternis compiled to an NFA fragment whose end state accepts that token’s kind id.Seqconcatenates fragments;Altε-joins them at both ends;Opt/Star/Plususe the classical ε-transition patterns.Refpatterns are unreachable at this point — fragments are resolved earlier — so reaching one is a bug.Top-level alternation. All tokens in the mode share a single start state via ε-transitions. This is what lets the lexer try every pattern in parallel.
Subset construction. The NFA is determinized into a flat
Vec<DfaState>. Each state carries an optionalaccept: Option<u16>plus its byte transitions already grouped intoByteArmruns (bytes sharing a target collapse into one arm; contiguous bytes within an arm collapse into ranges). State0([DEAD]) is reserved as the dead sink — every missing transition lands there, so the runtime’s inner loop is a single branch (“exit on 0”). The start state is always1([START]). The accept kind for a DFA state is the minimum token id present in the collapsed NFA states, which encodes “declaration order = priority”: tokens declared earlier win on ties.Minimization (gated on
DfaOpts::minimize). Subset construction over a UTF-8 NFA produces many nearly-identical states — for example four separate “I’m scanning whitespace” states because each entry byte arrives at a fresh DFA state.minimizepartitions states byacceptvalue, then iteratively splits each block by 256-byte transition signature against the current partitioning, until partitions stabilize. Equivalent states collapse into one. The pass preserves longest-match semantics — accept visits along any input trace stay on the same input position — and it makes the next step substantially more effective.Self-loop detection. For each state,
self_loop_rangesreturns the byte ranges whose arms loop the state back to itself (every bytebfor which the matching arm’stargetequalsstate.id). Backends use this to emit a “scan past every byte in this set” prologue before the per-byte switch — turning the byte-by-byte hot loop on[a-z]+-like states into one bulk scan that the optimizer can autovectorize. The arms themselves are unchanged (the self-loop arm stays in the dispatch table for backends that don’t implement the prologue, or for debug fall-through).
At runtime the lexer implements longest match: advance until the
transition table lands on dead, then back up to the last DFA state
that had an accept. The skip flag on TokenInfo is read by
the parser’s pump, not the DFA — skip tokens are matched the same
way as any other, they are just routed differently on the way into
the event stream. Mode actions (-> push(name) / -> pop) are
applied by the runtime via the generated apply_actions callback
once a token has matched, so the active mode stays in sync with the
token stream.
The bytes the DFA consumes are UTF-8 octets. Character classes with
multi-byte codepoints expand to multi-byte NFA paths, so the DFA
implicitly handles UTF-8 without a separate decoder. (The character
class lowering performs the canonical Russ-Cox split into UTF-8 byte
sequences before the NFA is built — see class_byte_seqs in
lexer_dfa.rs.)
The state-machine op set¶
A state’s body is split into two halves: a list of instructions
that must run in order (none of them transfer control out of the
body), and exactly one tail at the end (which always does). The
type Body { instrs: Vec<Instr>, tail: Tail } encodes the split
structurally — there’s no way to put a terminator in the middle of
instrs or to omit the tail.
Instr (non-terminating, sequenceable):
Enter(kind)/Exit(kind)Emit a structural event for the given rule-kind id.
Expect { kind, token_name, sync }Consume a token of
kind; on mismatch, emit an error and recover to the given SYNC set.PushRet(state)Push a return address onto the call stack. Pairs with a future
Tail::Retsomewhere down the call chain.
Tail (terminating, exactly one per body, always last):
Jump(state)Unconditional transfer.
RetPop the top return address (or terminate if the stack is empty).
Star { first, body, cont, head }While the lookahead matches
first, pushhead(the loop-back state, normally the state hosting this op) and callbody. Otherwise transfer tocont: ifSome(s),cur = s; ifNone,cur = ret()directly — see Sub-pass 3c — Optimize.Opt { first, body, cont }If the lookahead matches
first, push the continuation and callbody; otherwise transfer to it directly.contisSome(state)for the original push-and-jump shape, orNonefor a tail call (cur = bodyon match,cur = ret()on miss).Dispatch { tree, sync, cont }Pick one
Altarm using aDispatchTreeover up toktokens of lookahead, or recover viasyncon no match.contcarries the same encoding asOpt:Some(state)means each arm pushes that state before jumping to its body and theFallthrough/Errorpaths jump to it;Nonemeans tail call across the board.
Inside a Tail::Star / Tail::Opt / Tail::Dispatch arm,
the body is also a Body — same instrs + tail split,
recursively. DispatchTree is the flat decision tree:
Leaf(action) (Arm(Body) / Fallthrough / Error) or
Switch { depth, arms, default }. Each Switch inspects
look(depth).kind and branches into sub-trees — this maps directly
onto nested switch statements in every target.
The runtime invariants¶
Two semantic rules apply to every body in the table — both verified
by src/lowering/validate.rs after the optimizer finishes:
At most one event per execution path through the body.
Drive::stepruns one match arm and returns the event the body’s path produced. Two events on the same path would mean one of them got lost.No lookahead-reading op after an ``Expect`` in the same body. An
Expectconsume leaves slotK-1empty; the runtime can only refill betweenstepcalls, so any subsequentStar/Opt/Dispatch/Expectwould observe an unfilled slot.
Zero-event paths are allowed — step returns None and the
runtime’s pull loop calls step again. They show up in
Tail::Star / Tail::Opt miss paths (loop exit, optional skip)
and in pure-control bodies ([Jump(s)]) that the optimizer hasn’t
folded.
Layout produces invariant-correct output on its own; every optimizer
pass is also gated on is_valid_body so a rewrite never produces
a body that violates either rule. assert_runtime_invariants
runs as the final, mandatory step of lower_with_opts and panics
if anything slips through — it’s contract enforcement, not an
optimization, and is always on regardless of which optimizer flags
are set.
Sub-pass 3a — Build¶
File: src/lowering/build.rs.
Output: a Program (blocks with symbolic block ids, interned
FIRST/SYNC pools, token metadata, mode-name table).
The build phase walks each rule body and emits a block of block-level
ops (Enter / Exit / Expect / Call / Opt / Star
/ Dispatch). It is a faithful translation of the Expr tree:
Expr::Token(n)→Op::Expect, carrying the token name and the current rule’s SYNC id.Expr::Rule(n)→Op::Call { target: Target::Rule(n) }. The call target is symbolic — the actual block id is resolved in sub-pass 3b.Expr::Seq(xs)→ the children are emitted into the same block, each with an accurately computed tail (FIRST of the remainder plus the outer tail) so their prediction sets are precise.Expr::Alt(xs)→ oneOp::Dispatch. Each non-wholly-nullable arm becomes a sub-block whose entry id is stored alongside its FIRST-set id.has_epssummarizes whether any arm is nullable, which is what turns the outer dispatch default from “error” to “fall through”.Expr::Opt(x)/Expr::Star(x)→ oneOp::Opt/Op::Starplus a sub-block for the body.Expr::Plus(x)→ one mandatoryOp::Callinto the body’s sub-block, followed by anOp::Starover the same sub-block. The body is lowered once and shared.
Public rules are wrapped in Op::Enter / Op::Exit pairs so
their subtree appears in the event stream. Fragment rules are emitted
raw — calling into a fragment is indistinguishable from inlining its
body.
FIRST-set interning. Every prediction set that appears in an op is deduplicated through a hash table and replaced with a numeric id. This lets two syntactically-similar grammar fragments share a single table entry in the final output, which keeps the emitted source small.
SYNC-set computation. At the start of each rule, the build phase
computes the rule’s recovery set (FOLLOW(rule) ∪ {EOF}), interns
it, and caches the id in current_sync for the duration of that
rule’s lowering. Every Expect the builder emits stamps this id
on itself, so recovery at any point in the rule hits the same set.
Mode-name table. The build phase scans every token’s mode
field and assembles mode_names: mode_names[0] is always the
default (id 0), and any @mode(name) annotation contributes one
entry per unique name in declaration order. Each token’s mode_id
is filled in from this table, and -> push(name) actions resolve
the mode name to a numeric ModeActionInfo::Push(id) so codegen
and the runtime never deal with mode names directly.
Labels. Each op carries a human-readable label
(expr:alt0:call:atom) used in debug dumps and as a comment in the
generated code. Sub-blocks extend the parent’s label so the hierarchy
is visible without extra bookkeeping.
Sub-pass 3b — Layout¶
File: src/lowering/layout.rs.
Layout turns the block-level IR into numbered parser states and resolves every symbolic target into a concrete state id.
Steps:
Block ordering. BFS from each public rule’s entry block; each block reached is appended to an
orderlist. This keeps a rule and everything it transitively reaches adjacent in the state numbering, which is what makes debug dumps and generated switch tables readable.Entry-id assignment. Each block reserves exactly
ops.len()slots — one state per op. Ids start at1;0is theTERMINATEDsentinel (real states never share it). An empty block reserves a single slot for aBody { instrs: [], tail: Ret }.Op lowering. Each block-level op is translated to a state
Body. The block’s last op is emitted in tail form — straight-line ops (Enter/Exit/Expect/Call) takeTail::Retinstead ofTail::Jumpto a separate trailing-Ret state, and branchy ops (Opt/Star/Dispatch) takecont: None(tail call, the body’s trailingRetpops the caller’s continuation directly). Layout therefore never produces a standalone[Ret]-only state, and the one-event-per-step invariant holds with zero optimizer passes run.Straight-line ops emit a body whose
instrsis the action and whosetailisJump(fall)(non-tail) orRet(tail-of-block).Op::Call { target }becomesBody { instrs: [PushRet(fall)], tail: Jump(callee) }, orBody { instrs: [], tail: Jump(callee) }when the call is the block’s last op (the callee’s eventualRetpops our caller directly).Branchy ops live in the tail and encode their post-op transition via their own
contfield.
Dispatch-tree construction. The block-level
Op::Dispatchcarries a flat list of(first_set_id, target)arms. Layout expands those into aDispatchTreeviabuild_dispatch_tree: each arm contributes one entry per sequence in its FIRST set, and the trie groups entries by theirdepth-th token at each level. An arm whose prediction sequence runs out at depth d terminates the branch with that arm’s id, shadowing the outer default for any deeper prefixes.Lexer DFA, per mode.
lexer_dfa::compile_with_optsis called once per declared mode, on the subset of tokens whosemode_idmatches. The resulting per-modeVec<DfaState>lands inModeInfo; the runtime selects which DFA to use based on the top of its mode stack.
Output: a StateTable — complete, invariant-correct, but not yet
size-optimized.
Sub-pass 3c — Optimize¶
File: src/lowering/optimize.rs.
The optimizer is purely about size and dispatch-hop reduction;
the parser produced by layout alone is already correct (and already
satisfies the runtime invariants). Each pass is gated by a flag on
LoweringOpts — turning any flag off is safe; the generated parser
still works, just with more state hops.
The four passes run in a fixpoint loop because they feed each other:
inlining shrinks reference counts (which dead-state elimination then
drops); dropping a state’s last reference can turn neighbouring
states into trampolines; folding trampolines exposes more bodies that
inline_branch_bodies can simplify.
inline_jumps — when a body’s
tailisJump(N), replace it withN’s body (extending instrs with N’s instrs, taking N’s tail), as long as the result still satisfiesis_valid_body. Cycles are guarded by a visited set; chains stop at the first invariant-violating splice. The original target stays alive only if some other reference still points at it;eliminate_deadpicks up any leftovers.fold_trampolines — drops references to states whose body is exactly
Body { instrs: [], tail: Ret }. Two shapes get cleaned up:PushRet to a Ret state — the push pairs with a future
Retthat immediately re-pops. Drop thePushRetoutright and the eventualRetpops our caller’s continuation directly.``cont: Some(s)`` where ``s = [Ret]`` —
Tail::Star/Tail::Opt/Tail::Dispatchrewrite tocont: None. Backends emitcur = ret()on the miss / fall-through path instead of bouncing through the trampoline.
Crucially, the rewrite does not require the eliminated push to be in “tail position” with respect to its enclosing rule call: even if an earlier op in the same body already pushed something onto the stack, the analysis still holds. The trampoline’s only behaviour is to immediately re-pop, so whether the calling body’s trailing
Retpops via “trampoline-then-the-frame-below” or “the-frame-below-directly” produces the same end state. The optimisation only needs the continuation to be a pureRet, not the ambient stack to be empty.Entry states are never treated as trampolines: they need to remain callable by id from outside the dispatch loop even when their body has been spliced down to a single
Ret.inline_branch_bodies — replaces any sub-body that’s just
Body { instrs: [], tail: Jump(s) }inside aTail::Star/Tail::Opt/ dispatch arm with a copy ofs’s body, again under theis_valid_bodygate. Multi-predecessor targets get duplicated into each caller; the original stays alive only if some other reference still points at it. A rewrite that pushes its host past one event (or violates the post-Expect rule) is reverted, so other branchy ops in the host can still inline.eliminate_dead — BFS from the public entry points, following every reachable target (
Jump,PushRet,Opt.body,Opt.contwhen ``Some``, every leaf of everyDispatchTree,Star.body/Star.cont/Star.head). Any state not reached is removed.
After the fixpoint, state-id compaction renumbers the surviving
states to a dense 1..N range. Splicing and DCE leave the
original layout-time ids sparse (e.g. 1, 2, 3, 17, …); compaction
is a pure renumber that lets backends emit tighter match /
switch tables and per-state arrays.
Sub-pass 3d — Mark FIRST-set references¶
File: src/lowering/mod.rs (mark_first_set_references).
A small post-pass that decides which FIRST sets the generated runtime will actually reference. The codegens use this to skip emitting constants for unreferenced entries.
The rule, encoded as the FirstSet::has_references flag:
For LL(1) grammars, no FIRST set is ever referenced. The
Tail::Star/Tail::Optcodegen fork = 1inlines the FIRST set into amatcharm pattern (one alternative per token kind) rather than callingmatches_first(FIRST_n).For LL(k > 1) grammars, only
Tail::StarandTail::Optconsult the pool —matches_first(FIRST_n)is the only reference site.Tail::Dispatcharms also point at FIRST sets, but those are consumed at lowering time when theDispatchTreeis built; the resulting nested switch arms carry concrete token kinds, not pool ids, so the original FIRST set id is unreferenced after lowering.
The pass walks every body’s tail, collects the ids reached by
Tail::Star / Tail::Opt, and sets has_references on the
matching FirstSet entries (no-op when k == 1).
Sub-pass 3e — Validate¶
File: src/lowering/validate.rs.
assert_runtime_invariants walks every body in the table and
panics if the per-body invariants don’t hold (more than one event
per path, or a lookahead-reading op after an Expect). This is
the final step of lower_with_opts — mandatory, always on.
Catches source bugs (a layout / optimizer pass producing a
malformed body) before the table reaches codegen.
The IR’s Body { instrs, tail } split makes the structural rule
“every body ends in a terminator” unrepresentable to violate; the
validator covers the two semantic rules the type system can’t
express.
Interactions between build, layout, and optimize¶
A few design notes that matter when reading the source:
BlockIdandFirstSetId/SyncSetIdlet the build phase stay symbolic. The layout phase resolves block ids intoStateId; the intern pools are passed through to the state table verbatim. Backends never see block ids.Every
Expectinside a rule refers to the same SYNC set — computed once from the rule’s FOLLOW set. That’s why the intern pool stays small even for grammars with many productions.Layout already emits the block’s last op in tail form (
Tail::Retfor straight-line ops,cont: Nonefor branchy ones). The one-event-per-step invariant therefore holds before optimize ever runs — turning every optimizer flag off still produces a correctly-validating parser, just with more state hops.The optimizer’s fixpoint runs the four passes in a fixed order (inline_jumps → fold_trampolines → inline_branch_bodies → eliminate_dead) and re-runs them as long as anything changed. Convergence is guaranteed because every pass either shrinks the table or rewrites a body to a simpler shape, and there are finitely many bodies.
What comes out¶
A StateTable is all a backend needs. From this point forward
nothing about the grammar source, the FIRST/FOLLOW computation, or
the block-level IR is visible — the code-generation pass sees only
states, ops, intern pools, mode metadata, and the per-mode lexer DFAs.