Pass 1 — Grammar parsing
========================

Entry point: ``grammar::parse_grammar`` (``src/grammar/parser.rs``).
Output: a ``Grammar`` value (``src/grammar/ir.rs``).

Input
-----

The source text of a ``.parsuna`` file, as UTF-8 bytes. Nothing else
is consulted — there are no includes, no external definitions, no
search path.

Bootstrap
---------

The grammar-file parser is itself generated by parsuna. The file
``src/grammar/parsuna.parsuna`` describes the syntax of a parsuna
grammar; running the generator over it produces
``src/grammar/generated.rs``, a pull-parser over the grammar DSL.
``parser.rs`` then consumes that pull parser's events and builds the
``Grammar`` IR. Parsuna bootstraps itself.

The implication is that the first pass is a worked example of
consuming an event stream. ``parser.rs`` reads ``Enter``/``Exit``
pairs to recognise rule-shaped blocks, pulls tokens out of them with
a small ``Reader`` abstraction, and accumulates errors as it goes.

The Grammar IR
--------------

The parse produces a flat ``Grammar``:

* ``name: String`` — a label used by later phases for file and
  package naming. The parser leaves this empty; the CLI fills it
  from the input file's stem (or from ``--name`` when given).
* ``tokens: Vec<TokenDef>`` — every token declaration in source
  order. Each ``TokenDef`` records the name, the body
  (``TokenPattern``), the ``skip`` flag (``?`` prefix), the
  ``is_fragment`` flag (``_`` prefix), and a source span.
* ``rules: Vec<RuleDef>`` — every rule declaration in source order.
  Each ``RuleDef`` records the name, the body (``Expr``), the
  ``is_fragment`` flag, and a source span.

``TokenPattern`` is a regular expression tree over characters:
``Empty``, ``Literal(String)``, ``Class(CharClass)``, ``Ref(String)``,
``Seq``, ``Alt``, ``Opt``, ``Star``, ``Plus``. ``Expr`` is the
corresponding LL expression tree over tokens and rules: ``Empty``,
``Token(name)``, ``Rule(name)``, and the same combinators. Two trees,
one shape.

Where the distinction matters
-----------------------------

The parser decides whether a body is a token pattern or a rule
expression from the case of the first letter of the declaration's
name (see :doc:`../grammar_language`). It uses different descent
functions (``read_pattern_*`` vs. ``read_*``) so that:

* Character atoms (``'a'``, ``..``, ``.``, ``!``) and string literals
  are accepted only on the token side. Using one inside a rule body
  produces a pointed error like "string literal atoms are only valid
  inside token declarations".
* Identifiers in a rule body with an uppercase initial become
  ``Expr::Token(name)``; with a lowercase initial, ``Expr::Rule(name)``.
  Identifiers in a token body are always ``TokenPattern::Ref(name)``
  and are resolved later.

Error collection
----------------

``parse_grammar`` returns ``Result<Grammar, Vec<Error>>``. It does
not stop at the first problem; instead it accumulates diagnostics
and keeps parsing. This produces the characteristic "ten errors at
once" experience: a malformed grammar still parses to a mostly-shaped
IR, and the user sees every syntactic issue in a single run.

The ``Reader`` abstraction handles the book-keeping. It holds the
current lookahead event, exposes ``peek``/``advance``/``expect_*``
helpers, and transparently drops ``WS`` and ``COMMENT`` tokens so
callers never have to think about trivia.

Post-conditions
---------------

After parsing, the ``Grammar`` is **syntactically** well-formed
but **semantically** unchecked. It may still contain:

* References to undefined tokens or rules.
* Left-recursive rules.
* Token reference cycles.
* Duplicate declarations.
* Names that collide with runtime sentinels (``EOF``, ``ERROR``).

The next pass, :doc:`analyze`, is what catches those.