Using parsuna¶
This page is language-agnostic: it covers the generator CLI, the shape of a generated parser, and how consumers drive one. Backend specifics are called out only when they matter.
The CLI¶
The parsuna executable takes a grammar file plus a subcommand:
parsuna <grammar.parsuna> <subcommand> [options]
The useful subcommands for day-to-day work are:
checkLoad, parse, and analyze the grammar. Print a one-line summary (
grammar `NAME' OK: N tokens, M rules, LL(k)) and exit 0, or print diagnostics and exit non-zero. Use this as a pre-commit or CI gate.generate <target> [-o OUT]Emit a parser for
<target>. Valid targets arerust,python,typescript,go,java,csharp, andc. With-o OUT, files are written under that directory; without-o, they go in the current working directory. Per-backend flags exist for targets that need them —java --package com.example.foocontrols the Java package and the on-disk directory layout,csharp --namespace MyApp.Parserdoes the same for C#.tree-sitter [-o OUT]Emit a tree-sitter
grammar.jsfor editor tooling. The emitted grammar is purely declarative; it does not share the pull-parser runtime. Useful for syntax highlighting and code folding in editors that speak tree-sitter.debug <sub>Dump internal state. The sub-commands are
stats,tokens,rules --format tree|dot,analysis,lowering, anddfa --format plain|dot. Userules --format dotpiped into Graphviz to view rule railroad diagrams; usedfa --format dotfor the lexer DFA. These dumps are intended as a debugging aid while developing a grammar — the Pipeline describes each layer in full.
Two global options work on any subcommand:
--name NAME— overrides the identifier the backend uses for file and package names. By default the name is the grammar file’s stem (foo.parsuna→foo).--warnings warn|error— promotes lint warnings (unused fragments, etc.) to errors when set toerror. Default iswarn: warnings print but the build still succeeds.
Optimizer toggles¶
Five global flags disable individual lowering / DFA passes. They are pure performance switches — turning any of them off still produces a working parser, just with a larger state table or DFA:
--no-inline-jumps— leave each block-level op in its own state instead of absorbing trailingJump(N)chains into the body.--no-fold-trampolines— keep pure-Retstates alive instead of droppingPushRetreferences and rewriting branchy continuations to tail-call form.--no-inline-branch-bodies— leave a[Jump(s)]-shaped branch body alone instead of replacing it withs’s body directly.--no-eliminate-dead— keep states no entry can reach in the emitted output. The generated parser still works (those states are never entered), but the emitted source is bigger.--no-dfa-minimize— skip lexer-DFA partition-refinement minimization. Keeps the raw subset-construction output, which has many duplicate near-identical states.
These are mainly useful for diffing the lowered table against a known-good baseline when something looks wrong; in production builds the defaults are what you want.
The shape of a generated parser¶
Every backend produces the same five things, spelled in the idioms of the target language:
A TokenKind enumeration with one variant per declared token, plus the reserved
EOFsentinel (kind id0). Lex failures are not a TokenKind variant — see The event model for the per-language representation. Skip tokens appear in the enum like any other token; fragments do not.A RuleKind enumeration with one variant per non-fragment rule. Attached to every structural event so consumers can identify subtrees.
A parse_<rule> entry point per non-fragment rule, accepting a source string or (where the target runtime supports it) a stream. The entry point returns a Parser object — the runtime’s pull-loop iterator over the generated state machine.
The Parser object, which yields Event values one at a time. Every target spells this as its native iterator protocol (
Iteratorin Rust,Iterablein Python,Iterator<T>in TypeScript, aNext() (Event, bool)method in Go, etc.). Internal dispatch hooks (lookahead access, return-stack manipulation, event builders) are not on the Parser — they live on a sealedCursortype that only the runtime’s pull loop ever constructs, so you can’t accidentally poke at parser state from the outside.Event itself: a tagged union with five cases (
Enter,Exit,Token,Garbage,Error). See The event model for the full payload.
All of these come from the same state table, so whatever backend you pick, the sequence of events you observe for a given input is the same up to language-level encoding differences.
A minimal driver¶
The pattern is identical in every language: call the entry point, iterate, switch on the event tag. In pseudocode:
parser = parse_<rule>(source)
for event in parser:
match event.tag:
case "enter": # event.rule is a RuleKind
on_enter(event.rule, event.pos)
case "exit":
on_exit(event.rule, event.pos)
case "token": # event.token.kind is a TokenKind
on_token(event.token)
case "garbage": # token consumed by error recovery
on_garbage(event.token)
case "error":
on_error(event.error)
Three things to keep in mind while writing the driver:
Events are final in source order. The parser never retracts or reorders events; once you have seen one, it will not be un-emitted.
Error events do not stop the stream. The parser recovers and keeps going. An application that wants to abort on the first error must do so in its own driver — the parser will happily continue.
Garbage is distinct from Token. Tokens consumed during error recovery come through tagged
garbageso AST builders can drop them (or mark them as error spans) without tracking recovery state externally. A normalTokenevent is always legitimate parse data — including the “synced-to-expected” token after a recovery.
Skip emission policy¶
By default, skip-kind tokens (whitespace, comments — anything declared
with -> skip) are interleaved into the event stream as Token
events between structural events. Consumers building a formatter or
highlighter want them; consumers building an AST typically don’t.
Every backend exposes a way to turn skip emission off at the source:
Rust — pass a compile-time config:
parse_foo_from_str_with::<DropSkips>(src).EmitSkipsis the default. The choice is a zero-sized type parameter, so monomorphization removes the skip-emit branch entirely when you pickDropSkips.Go / Java / C# / TypeScript — pass a runtime
Options(orParserOptions) with theDropSkipsflag set when constructing the parser. Most bindings provide aparse_*_with_optionsvariant alongside the defaultparse_*constructor.C / Python — currently emit-only; if you need to drop skips, filter them out in the consumer.
The structural event stream is identical either way — only Token
events for skip kinds are affected.
Starting from a rule other than the default¶
Every non-fragment rule has an entry point. The first rule declared
is the default start, but nothing stops you from calling
parse_member or parse_number directly to parse a fragment of
input as if that rule were the top. This is useful for tests, for
editor tooling that parses at the cursor, and for composing parsers
(parse a request body with one entry, then parse its contents with
another).
Typical integration workflow¶
Write the grammar in a
.parsunafile.Run
parsuna grammar.parsuna checkuntil it reports OK. Fix undefined references, left recursion, or LL(k) conflicts as the checker reports them.Run
parsuna grammar.parsuna generate <target> -o src. Commit the emitted files into your repository — they are plain source, and diffing them is how you notice grammar changes you did not intend.In your application, call
parse_<rule>and walk the event stream. TranslateEnter/Exitpairs into whatever domain-specific tree you want; translateTokenevents into leaves; handleError(andGarbage) events by attaching a diagnostic to the surrounding construct.
Regenerating is cheap and should be fully automated — wire
parsuna generate into your build system so the committed files
never drift from the grammar.
Tokens, skips, and whitespace¶
Skip tokens (those declared with the -> skip action, such as
WS and COMMENT) are interleaved into the event stream just
before the next structural event that follows them in source order.
Consumers who only care about structure can either filter by kind
on the consuming side or pick the drop-skips option (above) and
have them silently consumed at the source.
Error events do not consume the token they attach to — the parser
still either consumes it (if recovery synchronizes on it, in which
case it comes through as a Token) or skips it as part of recovery
(Garbage). Application code should treat Error as a
diagnostic carrier, not a replacement for a token.
Interpreting token text¶
The parser does not post-process token text. STRING tokens are
delivered with their quotes and escapes intact; NUMBER tokens are
delivered as the raw lexeme. Un-escaping and numeric conversion are
the consumer’s job — this keeps the parser’s source text faithful so
tools like formatters and go-to-definition work without losing
information.