Using parsuna ============= This page is language-agnostic: it covers the generator CLI, the shape of a generated parser, and how consumers drive one. Backend specifics are called out only when they matter. The CLI ------- The parsuna executable takes a grammar file plus a subcommand:: parsuna [options] The useful subcommands for day-to-day work are: ``check`` Load, parse, and analyze the grammar. Print a one-line summary (``grammar `NAME' OK: N tokens, M rules, LL(k)``) and exit 0, or print diagnostics and exit non-zero. Use this as a pre-commit or CI gate. ``generate [-o OUT]`` Emit a parser for ````. Valid targets are ``rust``, ``python``, ``typescript``, ``go``, ``java``, ``csharp``, ``c``, and the meta-target ``all`` which emits every backend. With ``-o OUT``, files are written under that directory (one sub-directory per backend when multiple are emitted). Without ``-o``, files are written into the current working directory. ``tree-sitter [-o OUT]`` Emit a tree-sitter ``grammar.js`` for editor tooling. The emitted grammar is purely declarative; it does not share the pull-parser runtime. Useful for syntax highlighting and code folding in editors that speak tree-sitter. ``debug `` Dump internal state. The sub-commands are ``stats``, ``tokens``, ``rules --format tree|dot``, ``analysis``, ``lowering``, and ``dfa [--full] [--format plain|dot]``. Use ``rules --format dot`` piped into Graphviz to view rule railroad diagrams; use ``dfa --format dot`` for the lexer DFA. These dumps are intended as a debugging aid while developing a grammar — the :doc:`pipeline/index` describes each layer in full. The ``--name NAME`` option, accepted at any position, overrides the identifier the backend uses for file and package names. By default the name is the grammar file's stem (``foo.parsuna`` → ``foo``). The shape of a generated parser ------------------------------- Every backend produces the same five things, spelled in the idioms of the target language: * A **TokenKind** enumeration with one variant per declared token, plus the reserved ``EOF`` and ``ERROR`` sentinels. Skip tokens appear here like any other token; fragments do not. * A **RuleKind** enumeration with one variant per non-fragment rule. Attached to every structural event so consumers can identify subtrees. * A **parse_** entry point per non-fragment rule, accepting a source string or (where the target runtime supports it) a stream. The entry point returns a **Parser** object — the generated driver wrapped around the runtime's pull loop. * The **Parser** object, which yields **Event** values one at a time. Every target spells this as its native iterator protocol (``Iterator`` in Rust, ``Iterable`` in Python, ``Iterator`` in TypeScript, a ``NextEvent`` method in Go, etc.). * **Event** itself: a tagged union with four cases (``Enter``, ``Exit``, ``Token``, ``Error``). See :doc:`event_model` for the full payload. All of these come from the same state table, so whatever backend you pick, the sequence of events you observe for a given input is the same up to language-level encoding differences. A minimal driver ---------------- The pattern is identical in every language: call the entry point, iterate, switch on the event tag. In pseudocode:: parser = parse_(source) for event in parser: match event.tag: case "enter": # event.rule is a RuleKind on_enter(event.rule, event.pos) case "exit": on_exit(event.rule, event.pos) case "token": # event.token.kind is a TokenKind on_token(event.token) case "error": on_error(event.error) Two rules to keep in mind while writing the driver: 1. **Events are final in source order.** The parser never retracts or reorders events; once you have seen one, it will not be un-emitted. 2. **Error events do not stop the stream.** The parser recovers and keeps going. An application that wants to abort on the first error must do so in its own driver — the parser will happily continue. Starting from a rule other than the default ------------------------------------------- Every non-fragment rule has an entry point. The first rule declared is the *default start*, but nothing stops you from calling ``parse_member`` or ``parse_number`` directly to parse a fragment of input as if that rule were the top. This is useful for tests, for editor tooling that parses at the cursor, and for composing parsers (parse a request body with one entry, then parse its contents with another). Typical integration workflow ---------------------------- 1. Write the grammar in a ``.parsuna`` file. 2. Run ``parsuna grammar.parsuna check`` until it reports OK. Fix undefined references, left recursion, or LL(k) conflicts as the checker reports them. 3. Run ``parsuna grammar.parsuna generate -o src``. Commit the emitted files into your repository — they are plain source, and diffing them is how you notice grammar changes you did not intend. 4. In your application, call ``parse_`` and walk the event stream. Translate ``Enter``/``Exit`` pairs into whatever domain-specific tree you want; translate ``Token`` events into leaves; handle ``Error`` events by attaching a diagnostic to the surrounding construct. Regenerating is cheap and should be fully automated — wire ``parsuna generate`` into your build system so the committed files never drift from the grammar. Tokens, skips, and whitespace ----------------------------- Skip tokens (``?WS``, ``?COMMENT``) are re-attached to the event stream just before the next structural event that follows them in source order. Consumers who only care about structure can filter by event tag; consumers building a formatter or a highlighter see the skips in the correct positions. ``Error`` events do not consume the token they attach to — the parser still either consumes it (if recovery synchronizes on it) or skips it as part of recovery. Application code should treat ``Error`` as a diagnostic carrier, not a replacement for a token. Interpreting token text ----------------------- The parser does not post-process token text. ``STRING`` tokens are delivered with their quotes and escapes intact; ``NUMBER`` tokens are delivered as the raw lexeme. Un-escaping and numeric conversion are the consumer's job — this keeps the parser's source text faithful so tools like formatters and go-to-definition work without losing information.