Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .cursor/rules/specify-rules.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ Auto-generated from all feature plans. Last updated: 2025-11-10
- N/A (parser generation, no data storage) (001-refactor-terminology)
- JavaScript (Node.js), tree-sitter grammar DSL + ree-sitter-cli (project version), tree-sitter (bindings) (003-extended-annotation)
- N/A (parser grammar definition) (003-extended-annotation)
- Tree-sitter query language (scheme-like .scm); grammar.js (JavaScript, Node); parser generated to C via tree-sitter-cli + ree-sitter, tree-sitter-cli (npm); editors consume queries (Zed, Neovim, Helix, Emacs) (001-editor-improvements)
- N/A (query and doc files only) (001-editor-improvements)

- JavaScript (Node.js), tree-sitter grammar DSL + ree-sitter-cli ^0.25.10, tree-sitter ^0.25.0 (001-line-comments)

Expand All @@ -26,10 +28,10 @@ npm test && npm run lint
JavaScript (Node.js), tree-sitter grammar DSL: Follow standard conventions

## Recent Changes
- 001-editor-improvements: Added Tree-sitter query language (scheme-like .scm); grammar.js (JavaScript, Node); parser generated to C via tree-sitter-cli + ree-sitter, tree-sitter-cli (npm); editors consume queries (Zed, Neovim, Helix, Emacs)
- 003-extended-annotation: Added JavaScript (Node.js), tree-sitter grammar DSL + ree-sitter-cli (project version), tree-sitter (bindings)
- 001-refactor-terminology: Added JavaScript (Node.js) for grammar definition, generated C code for parser + ree-sitter (CLI and runtime), tree-sitter CLI for code generation

- 001-line-comments: Added JavaScript (Node.js), tree-sitter grammar DSL + ree-sitter-cli ^0.25.10, tree-sitter ^0.25.0

<!-- MANUAL ADDITIONS START -->
<!-- MANUAL ADDITIONS END -->
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ jobs:
python-version: "3.11"
- name: Build wheels
run: |
python -m pip install cibuildwheel==2.16.2
python -m pip install cibuildwheel==3.2.1
python -m cibuildwheel --output-dir dist
env:
CIBW_ARCHS: native
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ members = [
resolver = "2"

[workspace.package]
version = "0.3.3"
version = "0.3.4"
license = "MIT"
repository = "https://github.com/gram-data/tree-sitter-gram"
edition = "2021"
Expand Down
2 changes: 1 addition & 1 deletion Makefile

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/gram-ebnf.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ Supported escape sequences in single, double, and backtick strings: `\\`, `\'`,

### 9.5 Tagged String

A tagged string attaches a type tag (a `symbol`) to a string value, expressed in two forms:
A tagged string attaches a type tag (a `symbol`) to a string value, expressed in two forms. See [Tagged strings and injections](tagged-strings-and-injections.md) for well-known tags, language injection, and the `::` schema convention.

```ebnf
tagged_string = symbol, "`", /([^`\\\n])*/, "`"
Expand Down
2 changes: 2 additions & 0 deletions docs/gram-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,8 @@ arguments.
| `{k: "v"}` | `Value::Map` | scalars only |
| `bareword` | `Value::Symbol` | unquoted symbol in value position |

See [Tagged strings and injections](tagged-strings-and-injections.md) for well-known tags, language injection, and the `::` schema convention.

String escape sequences (single, double, backtick forms): `\\`, `\'`, `\"`,
`` \` ``, `\b`, `\f`, `\n`, `\r`, `\t`.

Expand Down
76 changes: 76 additions & 0 deletions docs/tagged-strings-and-injections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Tagged Strings and Language Injection

This document describes how tagged strings work in Gram notation, how syntax highlighting (language injection) is applied to their content, and how the `::` convention supports schema definitions without extending the grammar for each type system.

---

## 1. Tagged string syntax

A **tagged string** attaches a type or format tag to string content:

- **Backtick form:** `` tag`content` ``
- **Fenced form:** ` ```tag` followed by a newline, then content, then ` ``` `

The tag is a **symbol**; the content is arbitrary text (backtick form: single line with escapes; fenced form: multiline). The grammar does not restrict which tags may appear. Downstream libraries and editors interpret tags by convention.

---

## 2. Language injection (syntax highlighting)

Tree-sitter injection queries (`queries/injections.scm`) use the **tag’s text as the injection language**. You do **not** need to add every possible tag to the grammar or query file.

- **Dynamic injection:** For most tags, the content is highlighted with the language whose name matches the tag (e.g. `sql`, `json`, `html`).
- **Overrides:** A small set of tags are mapped to parser names that differ from the tag (e.g. `md` → `markdown`, `ts` → `typescript`) in `injections.scm`. Adding more overrides is optional; editors and downstream tools can also map tag names to parsers themselves.

### Well-known tags

| Tag | Typical use | Parser / notes |
|----------|---------------------------|-----------------------------------------------------|
| `md` | Markdown | Mapped to `markdown` in injections.scm |
| `ts` | TypeScript (types/code) | Mapped to `typescript` in injections.scm |
| `date` | ISO 8601 date | Often no parser; content is `YYYY-MM-DD` |
| `datetime` | ISO 8601 date-time | Often no parser; content is ISO 8601 |
| `time` | ISO 8601 time | Often no parser; content is time part |
| `sql` | SQL | Tag used as language name (parser often `sql`) |
| `json` | JSON | Tag used as language name |
| `html` | HTML | Tag used as language name |

Editors and consumers can extend this list by mapping additional tag names to language parsers (e.g. `yaml`, `graphql`) without changing the Gram grammar or its queries.

---

## 3. Schema definitions and the `::` convention

In records, the grammar allows two separators between a property key and its value:

- **`:`** — normal property: value is data.
- **`::`** — often used for **type or schema** definitions: value describes the *kind* or *shape* of the property rather than a literal value.

Example:

```gram
{ name:: ts`string`, count:: ts`number`, bio:: md`# Markdown allowed` }
```

Here, `name`, `count`, and `bio` are property names whose **value types** are described by tagged strings: TypeScript type expressions (`string`, `number`) and Markdown. Downstream can interpret `::` as “schema slot” and use the tagged content for validation, codegen, or documentation without the grammar ever defining TypeScript, SQL, or other schema languages.

### Encouraging and supporting this

- **Grammar:** No change is required. `record_property` already allows both `:` and `::`; the value can be any `_value`, including tagged strings.
- **Convention:** Document and use `::` for “type/schema” and reserve tagged strings (e.g. `ts`, `SQL`) for the type description. That keeps schema concerns out of the core grammar and lets each ecosystem choose its type languages.
- **Editors:** Injection applies to tagged string content regardless of context. So `name:: ts\`string\`` gets TypeScript highlighting inside the backticks. Editors that support tree-sitter injections will get this from the existing `injections.scm`.
- **Downstream:** Libraries can treat `key:: tagged_string` as “property `key` has type/schema given by tag and content,” and dispatch to the right validator or generator (e.g. `ts` → TypeScript type checker, `SQL` → schema validator).

---

## 4. Summary

- **Tags:** Arbitrary; no need to enumerate every tag in the grammar. Injection uses the tag symbol; a few well-known tags are mapped in `injections.scm`; the rest use the tag text as the language name or are mapped by the editor.
- **Schema:** Use `::` for type/schema properties and tagged strings (`ts`, `SQL`, etc.) for the type description. The grammar stays generic; schema support is by convention and downstream tooling.

---

## See also

- [Gram Notation Reference](gram-reference.md) — value types and `Value::TaggedString` in the data model
- [Gram EBNF](gram-ebnf.md) — formal `tagged_string` syntax (§9.5)
2 changes: 1 addition & 1 deletion editors/zed/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# this is created by scripts/prepare-zed-extension.sh
# Populated by Zed when loading the extension (grammar clone). Do not commit.
grammars/
43 changes: 30 additions & 13 deletions editors/zed/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,27 +115,44 @@ To work on this extension:

```
editors/zed/
├── extension.toml # Extension metadata
├── grammars/
│ └── tree-sitter-gram/ # Grammar files
│ ├── grammar.js # Tree-sitter grammar definition
│ └── src/ # Generated parser source
├── extension.toml # Extension metadata; points at grammar repo (repository + rev)
├── languages/
│ └── gram/
│ ├── config.toml # Language configuration
│ └── queries/ # Syntax highlighting queries
│ └── highlights.scm
└── example.gram # Example file for testing
│ ├── config.toml # Zed language config (brackets, suffixes, etc.)
│ ├── highlights.scm # → ../../../../queries/ (symlink; edit queries/ in repo root)
│ ├── indents.scm
│ ├── locals.scm
│ └── injections.scm
├── test.gram # Example file for testing
└── .gitignore # Ignores grammars/ (populated by Zed from extension.toml)
```

Query files (`.scm`) in `languages/gram/` are symlinks to the canonical `queries/` directory at the repo root. Edit `queries/*.scm` there; do not edit the copies under `editors/zed` directly. Running `scripts/prepare-zed-extension.sh` copies `queries/*.scm` into this directory (e.g. for distribution) and updates extension version/rev.

The `grammars/` directory is created by Zed when it loads the extension (it clones the grammar from the URL in `extension.toml`). It is gitignored. If you see a nested `editors/zed/grammars/gram/...` path, that is the cloned repo inside Zed’s cache; you can ignore or delete `editors/zed/grammars/` locally.

### Keeping the grammar revision in sync

The extension pins the tree-sitter-gram grammar with `repository` and `rev` in `extension.toml`. Zed uses that to fetch and build the parser. To keep it aligned with the latest version:

| Goal | Command | What it does |
|------|---------|---------------|
| **Local testing** | `npm run zed:dev` | Sets `repository = "file://<repo-root>"` and `rev = HEAD`. Zed uses your local clone at the current commit, so you can test grammar/query changes without pushing. |
| **Prepare for publish** | `npm run zed:publish` | Sets `repository` to the public GitHub URL (from `package.json`) and `rev = HEAD`. Run this before committing a release so the published extension points at the correct commit on GitHub. |

After either command, `extension.toml` is updated in place. For local dev you typically don’t commit that change (so the repo keeps a rev that matches the last release). For a release, run `zed:publish`, then commit and push so the extension and the tagged release stay aligned.

**If Zed shows an old version (e.g. 0.1.11) after installing the dev extension:** Zed may be using a cached clone of the grammar (at an old rev) or an older copy of the extension. Try: (1) Uninstall the Gram extension from Zed’s Extensions panel. (2) Delete the grammar cache: remove `editors/zed/grammars/` if it exists (Zed recreates it when needed). (3) Run `npm run zed:dev` again so `extension.toml` has the current version and rev. (4) In Zed, run “Install Dev Extension” and select `editors/zed` again. Restart Zed and recheck the extension version.

## Contributing

Contributions are welcome! Please see the main [repository](https://github.com/gram-data/tree-sitter-gram) for contribution guidelines.

To improve syntax highlighting:
1. Edit `languages/gram/queries/highlights.scm`
2. Test with various Gram files
3. Submit a pull request
To improve syntax highlighting and editor behavior:

1. Edit the canonical query files under `queries/` at the repo root (e.g. `queries/highlights.scm`).
2. The extension uses those via symlinks in `languages/gram/`; run `scripts/prepare-zed-extension.sh` if you need to copy them for distribution.
3. Test with various Gram files, then submit a pull request.

## License

Expand Down
6 changes: 3 additions & 3 deletions editors/zed/extension.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
id = "gram"
name = "Gram Language Support"
version = "0.3.3"
version = "0.3.4"
schema_version = 1
authors = ["Gram Data Contributors"]
description = "Support for Gram notation - a subject-oriented notation for structured data"
description = "Support for Gram notation - composable data patterns"

# path = "grammars/tree-sitter-gram"
[grammars.gram]
repository = "https://github.com/gram-data/tree-sitter-gram"
rev = "78fba591ce4e3ca86ae77c871cfc9e87205c8e2b"
rev = "7aee4c203a5c6ea48660ae6d8af849ed90317bed"
41 changes: 32 additions & 9 deletions editors/zed/languages/gram/highlights.scm
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,38 @@
; Boolean literals
(boolean_literal) @boolean

; Symbols and identifiers
; Comment (FR-003)
(comment) @comment

; Tagged-string tag distinct from content (FR-002)
(tagged_string tag: (symbol) @attribute)

; Reference identifier: pattern_reference (FR-001)
(pattern_reference identifier: (_) @variable)

; Definition-like identifiers (FR-001): @type
; subject/node subject is _subject (use wildcard _ as it may be hidden in some runtimes)
(subject_pattern subject: (_ identifier: (_) @type))
(subject_pattern subject: (_ labels: (labels (symbol) @type)))
(node_pattern subject: (_ identifier: (_) @type))
(node_pattern subject: (_ labels: (labels (symbol) @type)))
(relationship_pattern left: (node_pattern subject: (_ identifier: (_) @type)))
(relationship_pattern left: (node_pattern subject: (_ labels: (labels (symbol) @type))))
(relationship_pattern right: (node_pattern subject: (_ identifier: (_) @type)))
(relationship_pattern right: (node_pattern subject: (_ labels: (labels (symbol) @type))))
; Arrow kind: subject is inside optional brackets on the arrow
(relationship_pattern kind: (right_arrow subject: (_ identifier: (_) @type)))
(relationship_pattern kind: (right_arrow subject: (_ labels: (labels (symbol) @type))))
(relationship_pattern kind: (left_arrow subject: (_ identifier: (_) @type)))
(relationship_pattern kind: (left_arrow subject: (_ labels: (labels (symbol) @type))))
(relationship_pattern kind: (undirected_arrow subject: (_ identifier: (_) @type)))
(relationship_pattern kind: (undirected_arrow subject: (_ labels: (labels (symbol) @type))))
(relationship_pattern kind: (bidirectional_arrow subject: (_ identifier: (_) @type)))
(relationship_pattern kind: (bidirectional_arrow subject: (_ labels: (labels (symbol) @type))))
(identified_annotation identifier: (_) @type)
(identified_annotation labels: (labels (symbol) @type))

; Symbols and identifiers (generic; definition/reference/tag captured above)
(symbol) @variable

; Keywords and operators
Expand Down Expand Up @@ -48,14 +79,6 @@

; Annotation keys (property-style) and headers (identified/label-style)
(property_annotation key: (symbol) @attribute)
(identified_annotation identifier: (_) @attribute)
(identified_annotation labels: (_) @attribute)

; Subject Pattern notation (special highlighting)
(subject_pattern) @type

; Node with labels
(node_pattern (labels (symbol) @type))

; Relationship arrows (special highlighting for graph syntax)
(relationship_pattern) @keyword
Expand Down
14 changes: 14 additions & 0 deletions editors/zed/languages/gram/indents.scm
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
; Indentation: 2 spaces per level for brackets and multi-line structures
; FR-007 — specs/004-editor-improvements/contracts/indents.md

; Record {}
"{" @indent
"}" @indent.end

; Subject pattern []
"[" @indent
"]" @indent.end

; Node pattern ()
"(" @indent
")" @indent.end
29 changes: 29 additions & 0 deletions editors/zed/languages/gram/injections.scm
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
; Language injection for tagged strings: tag`content` and ```tag\ncontent\n```
;
; The tag symbol is used as the injection language so that downstream and editors
; can support arbitrary tags without changing the grammar. Well-known tags (md, ts,
; date, datetime, time, sql, json, html, etc.) are documented in docs/tagged-strings-and-injections.md.
;
; Overrides below map tags that do not match common parser names. The final
; rule uses the tag's text as the language name for all other tags (e.g. "sql",
; "json", "html" often match parser names).

; md -> markdown
(tagged_string
tag: (symbol) @_tag
content: (string_content) @injection.content)
(#eq? @_tag "md")
(#set! injection.language "markdown")

; ts -> typescript
(tagged_string
tag: (symbol) @_tag
content: (string_content) @injection.content)
(#eq? @_tag "ts")
(#set! injection.language "typescript")

; Dynamic: use tag text as language name for all other tags (sql, json, html, etc.)
; Editors may map additional tags (e.g. date, datetime, time) to parsers or leave as plain.
(tagged_string
tag: (symbol) @injection.language
content: (string_content) @injection.content)
28 changes: 28 additions & 0 deletions editors/zed/languages/gram/locals.scm
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
; Locals: go to definition and highlight references (file scope)
; FR-005, FR-006 — specs/004-editor-improvements/contracts/locals.md

; File scope: all definitions and references in one scope
(gram_pattern) @local.scope

; Definitions: identifiers that define a pattern or annotation
(subject_pattern subject: (_ identifier: (_) @local.definition))
(subject_pattern subject: (_ labels: (labels (symbol) @local.definition)))
(node_pattern subject: (_ identifier: (_) @local.definition))
(node_pattern subject: (_ labels: (labels (symbol) @local.definition)))
(relationship_pattern left: (node_pattern subject: (_ identifier: (_) @local.definition)))
(relationship_pattern left: (node_pattern subject: (_ labels: (labels (symbol) @local.definition))))
(relationship_pattern right: (node_pattern subject: (_ identifier: (_) @local.definition)))
(relationship_pattern right: (node_pattern subject: (_ labels: (labels (symbol) @local.definition))))
(relationship_pattern kind: (right_arrow subject: (_ identifier: (_) @local.definition)))
(relationship_pattern kind: (right_arrow subject: (_ labels: (labels (symbol) @local.definition))))
(relationship_pattern kind: (left_arrow subject: (_ identifier: (_) @local.definition)))
(relationship_pattern kind: (left_arrow subject: (_ labels: (labels (symbol) @local.definition))))
(relationship_pattern kind: (undirected_arrow subject: (_ identifier: (_) @local.definition)))
(relationship_pattern kind: (undirected_arrow subject: (_ labels: (labels (symbol) @local.definition))))
(relationship_pattern kind: (bidirectional_arrow subject: (_ identifier: (_) @local.definition)))
(relationship_pattern kind: (bidirectional_arrow subject: (_ labels: (labels (symbol) @local.definition))))
(identified_annotation identifier: (_) @local.definition)
(identified_annotation labels: (labels (symbol) @local.definition))

; References: pattern_reference identifier
(pattern_reference identifier: (_) @local.reference)
1 change: 0 additions & 1 deletion editors/zed/zed

This file was deleted.

4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@gram-data/tree-sitter-gram",
"version": "0.3.3",
"version": "0.3.4",
"description": "subject-oriented notation for structured data",
"homepage": "https://gram-data.github.io",
"repository": {
Expand Down
Loading
Loading