Using parsuna¶
This page is language-agnostic: it covers the generator CLI, the shape of a generated parser, and how consumers drive one. Backend specifics are called out only when they matter.
The CLI¶
The parsuna executable takes a grammar file plus a subcommand:
parsuna <grammar.parsuna> <subcommand> [options]
The useful subcommands for day-to-day work are:
checkLoad, parse, and analyze the grammar. Print a one-line summary (
grammar `NAME' OK: N tokens, M rules, LL(k)) and exit 0, or print diagnostics and exit non-zero. Use this as a pre-commit or CI gate.generate <target> [-o OUT]Emit a parser for
<target>. Valid targets arerust,python,typescript,go,java,csharp,c, and the meta-targetallwhich emits every backend. With-o OUT, files are written under that directory (one sub-directory per backend when multiple are emitted). Without-o, files are written into the current working directory.tree-sitter [-o OUT]Emit a tree-sitter
grammar.jsfor editor tooling. The emitted grammar is purely declarative; it does not share the pull-parser runtime. Useful for syntax highlighting and code folding in editors that speak tree-sitter.debug <sub>Dump internal state. The sub-commands are
stats,tokens,rules --format tree|dot,analysis,lowering, anddfa [--full] [--format plain|dot]. Userules --format dotpiped into Graphviz to view rule railroad diagrams; usedfa --format dotfor the lexer DFA. These dumps are intended as a debugging aid while developing a grammar — the Pipeline describes each layer in full.
The --name NAME option, accepted at any position, overrides the
identifier the backend uses for file and package names. By default
the name is the grammar file’s stem (foo.parsuna → foo).
The shape of a generated parser¶
Every backend produces the same five things, spelled in the idioms of the target language:
A TokenKind enumeration with one variant per declared token, plus the reserved
EOFandERRORsentinels. Skip tokens appear here like any other token; fragments do not.A RuleKind enumeration with one variant per non-fragment rule. Attached to every structural event so consumers can identify subtrees.
A parse_<rule> entry point per non-fragment rule, accepting a source string or (where the target runtime supports it) a stream. The entry point returns a Parser object — the generated driver wrapped around the runtime’s pull loop.
The Parser object, which yields Event values one at a time. Every target spells this as its native iterator protocol (
Iteratorin Rust,Iterablein Python,Iterator<T>in TypeScript, aNextEventmethod in Go, etc.).Event itself: a tagged union with four cases (
Enter,Exit,Token,Error). See The event model for the full payload.
All of these come from the same state table, so whatever backend you pick, the sequence of events you observe for a given input is the same up to language-level encoding differences.
A minimal driver¶
The pattern is identical in every language: call the entry point, iterate, switch on the event tag. In pseudocode:
parser = parse_<rule>(source)
for event in parser:
match event.tag:
case "enter": # event.rule is a RuleKind
on_enter(event.rule, event.pos)
case "exit":
on_exit(event.rule, event.pos)
case "token": # event.token.kind is a TokenKind
on_token(event.token)
case "error":
on_error(event.error)
Two rules to keep in mind while writing the driver:
Events are final in source order. The parser never retracts or reorders events; once you have seen one, it will not be un-emitted.
Error events do not stop the stream. The parser recovers and keeps going. An application that wants to abort on the first error must do so in its own driver — the parser will happily continue.
Starting from a rule other than the default¶
Every non-fragment rule has an entry point. The first rule declared
is the default start, but nothing stops you from calling
parse_member or parse_number directly to parse a fragment of
input as if that rule were the top. This is useful for tests, for
editor tooling that parses at the cursor, and for composing parsers
(parse a request body with one entry, then parse its contents with
another).
Typical integration workflow¶
Write the grammar in a
.parsunafile.Run
parsuna grammar.parsuna checkuntil it reports OK. Fix undefined references, left recursion, or LL(k) conflicts as the checker reports them.Run
parsuna grammar.parsuna generate <target> -o src. Commit the emitted files into your repository — they are plain source, and diffing them is how you notice grammar changes you did not intend.In your application, call
parse_<rule>and walk the event stream. TranslateEnter/Exitpairs into whatever domain-specific tree you want; translateTokenevents into leaves; handleErrorevents by attaching a diagnostic to the surrounding construct.
Regenerating is cheap and should be fully automated — wire
parsuna generate into your build system so the committed files
never drift from the grammar.
Tokens, skips, and whitespace¶
Skip tokens (?WS, ?COMMENT) are re-attached to the event
stream just before the next structural event that follows them in
source order. Consumers who only care about structure can filter by
event tag; consumers building a formatter or a highlighter see the
skips in the correct positions.
Error events do not consume the token they attach to — the parser
still either consumes it (if recovery synchronizes on it) or skips it
as part of recovery. Application code should treat Error as a
diagnostic carrier, not a replacement for a token.
Interpreting token text¶
The parser does not post-process token text. STRING tokens are
delivered with their quotes and escapes intact; NUMBER tokens are
delivered as the raw lexeme. Un-escaping and numeric conversion are
the consumer’s job — this keeps the parser’s source text faithful so
tools like formatters and go-to-definition work without losing
information.