From bc1d639c10330041374c516d5653602d0f624f09 Mon Sep 17 00:00:00 2001 From: Rich Lander Date: Sun, 28 Dec 2025 10:24:27 -0800 Subject: [PATCH 01/17] Add LLMs spec; intro --- .../exposing-hypermedia-to-llms.md | 91 +++++++++++++++++++ 1 file changed, 91 insertions(+) create mode 100644 accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md new file mode 100644 index 000000000..2645bebfd --- /dev/null +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -0,0 +1,91 @@ +# Exposing Hypermedia Information Graphs to LLMs + +[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is enough meta-information in a hypermedia document that a semantic consumer can successfully traverse an information graph to find the information demanded by a user prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of the ability to derive meaningful insight with tools beyond basic text search. JSON and XML came out of that heritage, with structured query enabled by [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath), both assuming a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across documents. [Hypertext Application Language](https://en.wikipedia.org/wiki/Hypertext_Application_Language) is a standard implementation of this concept, enabling applications to expose a semantic space to a broad range of consumers, enabling navigation patterns beyond what a schema could reasonably describe. + +> A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). + +In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 KM -> left" and "10 KM -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where the courses diverge is the extra descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you know with a value you don't. This signposting gets us closer to understanding and being able to explore the territory. + +This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `"gardening-deep-dive": "https://../diving-deeper-on-gardening.md"` or expose more descriptive complexity by separating link kind, the "deep-dive", from target kind, "gardening", more cleanly splitting the adjective and noun. The better the implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. + +Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Semantic graphs extend this idea with "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space—in both content and link relations—allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. + +Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. The semantic relationships and other meta-information are used as the basis of typical LLM mechanics like vector similarity, making hypermedia a kind of RAG scheme and suitable for static-webhost deployment. + +The concept of a pre-baked static hypermedia graph has been applied to the .NET release notes. That project was initially approached as a modern revamp of a set of JSON files that are used to inform and direct cloud-infra deployment and compliance workflows at scale. Over time, it became obvious that LLMs could read the same content directly and self-reason about its content and navigation patterns. Early experimentation proved that out. The primary techniques used to improve applicability for LLMs are semantic naming and graph-resident guidance. These techniques can be quite subtle, where a small shift in semantic bias can result in a large shift in LLM performance. + +Graph-resident guidance consists of skills and workflows. HATEOAS tells us that "customer" can be a relation of a sales order. Why not make "graph-instructions" a relation of a graph? Skills and workflows are first-class relations in the graph, enabling graph designers to express navigation intent. Skills follow the Anthropic skill format, while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide the reader with the "ten-km-route" workflow if that's a match for the intended outcome. + +The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL-native workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. + +## LLM Graph Entrypoint + +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support n-9s levels of reliability and be subject to rigorous engineering constraints. However, we're in the early days of AI with regular change that could require signficant rework of LLM enablement. These needs are in firm opposition, needing some form of tie-break. + +Instead, we can view the core graph as a well-defined data-layer and expose a separate application-layer entrypoint for LLMs that doesn't guaruntee the same compatibility promise and can evolve over time without that burden. + +We can compare the two entrypoints. + +Core graph entrypoint: + +```json + "_embedded": { + "releases": [ + { + "version": "10.0", + "release_type": "lts", + "supported": true, + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json" + } + } + }, +``` + +That's how the core graph exposes a major version. As suggested, its skeletal. + +LLM entrypoint: + +```json + "_embedded": { + "patches": { + "10.0": { + "version": "10.0.1", + "release_type": "lts", + "security": false, + "support_phase": "active", + "supported": true, + "sdk_version": "10.0.101", + "latest_security_patch": "10.0.0-rc.2", + "latest_security_patch_date": "2025-10-14T00:00:00+00:00", + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/10.0.1/index.json" + }, + "downloads": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/downloads/index.json" + }, + "latest-month": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/12/index.json" + }, + "latest-security-disclosures": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json" + }, + "latest-security-month": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json" + }, + "latest-security-patch": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/preview/rc2/index.json" + }, + "major": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json" + }, + "major-manifest": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/manifest.json" + } + } + }, +``` + +The LLM graph exposes a lot more useful information From 49a82df3c61b7e6f478d576731154e82f6ac397e Mon Sep 17 00:00:00 2001 From: Rich Lander Date: Sun, 28 Dec 2025 11:07:37 -0800 Subject: [PATCH 02/17] Add tension section --- .../exposing-hypermedia-to-llms.md | 30 ++++++++++++++----- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 2645bebfd..0a3486a43 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -18,15 +18,15 @@ Graph-resident guidance consists of skills and workflows. HATEOAS tells us that The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL-native workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. -## LLM Graph Entrypoint +## Graph entrypoint tension -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support n-9s levels of reliability and be subject to rigorous engineering constraints. However, we're in the early days of AI with regular change that could require signficant rework of LLM enablement. These needs are in firm opposition, needing some form of tie-break. +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support n-9s levels of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to waves of externally-driven change that requires significant rework to maintain high quality LLM enablement. These requirements are in firm opposition, needing a winner to pull ahead, a painful compromise, or some form of tie-break. -Instead, we can view the core graph as a well-defined data-layer and expose a separate application-layer entrypoint for LLMs that doesn't guaruntee the same compatibility promise and can evolve over time without that burden. +Instead, we can view the core graph as a well-defined data-layer that honors the reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without a heavy compatibility burden. We can compare the two entrypoints. -Core graph entrypoint: +[Core graph entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json): ```json "_embedded": { @@ -43,9 +43,9 @@ Core graph entrypoint: }, ``` -That's how the core graph exposes a major version. As suggested, its skeletal. +That's how the core graph exposes a major version. As suggested, it's skeletal. The graph entrypoint only needs to be updates once or twice a year. Even if the file is regenerated daily, git won't notice any changes. -LLM entrypoint: +[LLM entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json): ```json "_embedded": { @@ -88,4 +88,20 @@ LLM entrypoint: }, ``` -The LLM graph exposes a lot more useful information +The LLM graph exposes a lot more useful information. The semantic data and link relations are on clear display. + +The strongest indicator of semantic design is that there are multiple relations for the same underlying resource. Both `latest-security-disclosures` and `latest-security-month` point to the same month index, but they offer different semantic pathways for discovering it. An LLM asking "what are the latest CVEs?" navigates one way; an LLM asking "what happened in October?" navigates another. Same destination, different semantic intent. + +This approach enables both principles from earlier: + +- "match for the intended outcome": the designer provides multiple semantic pathways for different query types +- "match a key you know with a value you don't": the reader discovers the right pathway through semantic labels + +## LLM entry point + +Two strong design principles emerged from intuition and then observed behavior from eval: + +- Consistently apply a semantic model throughout the graph. It's a comfort to find a concept where it is expected. +- Expose resources in terms of structual kind, like `major` aand `-month`, and desired output, like `-security-disclosures`. + +This dual approach to semantic naming sometimes results in this double-mapping. Emperical observation suggests that LLMs prefer the outcome-based naming, while the more schema-correct and initial naming is the structual framing. From 1945a202b93d1e4da3cb3e6b7608e95febf44511 Mon Sep 17 00:00:00 2001 From: Rich Lander Date: Sun, 28 Dec 2025 14:33:49 -0800 Subject: [PATCH 03/17] Clarify more complex link-relation --- .../release-notes-graph/exposing-hypermedia-to-llms.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 0a3486a43..0383ce913 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -6,7 +6,7 @@ In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 KM -> left" and "10 KM -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where the courses diverge is the extra descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you know with a value you don't. This signposting gets us closer to understanding and being able to explore the territory. -This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `"gardening-deep-dive": "https://../diving-deeper-on-gardening.md"` or expose more descriptive complexity by separating link kind, the "deep-dive", from target kind, "gardening", more cleanly splitting the adjective and noun. The better the implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. +This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }` or expose more descriptive complexity by separating `"link-relation": "deep-dive"` from `"target-kind": "gardening"`, to cleanly split the link-adjective and its target-noun. The better the implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Semantic graphs extend this idea with "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space—in both content and link relations—allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. @@ -16,13 +16,13 @@ The concept of a pre-baked static hypermedia graph has been applied to the .NET Graph-resident guidance consists of skills and workflows. HATEOAS tells us that "customer" can be a relation of a sales order. Why not make "graph-instructions" a relation of a graph? Skills and workflows are first-class relations in the graph, enabling graph designers to express navigation intent. Skills follow the Anthropic skill format, while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide the reader with the "ten-km-route" workflow if that's a match for the intended outcome. -The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL-native workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. +The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL-native workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. And, of course, this approach has utility beyond release notes. ## Graph entrypoint tension -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support n-9s levels of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to waves of externally-driven change that requires significant rework to maintain high quality LLM enablement. These requirements are in firm opposition, needing a winner to pull ahead, a painful compromise, or some form of tie-break. +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support n-9s levels of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that requires significant re-evaluation and re-work to maintain high quality LLM enablement. These requirements are in firm opposition, needing a winner to pull ahead, a painful compromise, or some form of tie-break. -Instead, we can view the core graph as a well-defined data-layer that honors the reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without a heavy compatibility burden. +Compromises are no fun! Let's take the tie-break. We can instead view the core graph as a well-defined data-layer that honors the reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. We can compare the two entrypoints. @@ -99,6 +99,8 @@ This approach enables both principles from earlier: ## LLM entry point + + Two strong design principles emerged from intuition and then observed behavior from eval: - Consistently apply a semantic model throughout the graph. It's a comfort to find a concept where it is expected. From 7b61ba96b9fc0103d57b516857abca5835378f69 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Wed, 31 Dec 2025 09:01:55 -0800 Subject: [PATCH 04/17] Update intro --- .markdownlint.json | 16 +++++------ .../exposing-hypermedia-to-llms.md | 28 +++++++++++-------- 2 files changed, 25 insertions(+), 19 deletions(-) diff --git a/.markdownlint.json b/.markdownlint.json index a02418cfd..b274f82fb 100644 --- a/.markdownlint.json +++ b/.markdownlint.json @@ -1,9 +1,9 @@ { - "default": true, - "MD003": { "style": "atx" }, - "MD004": false, - "MD007": { "indent": 4 }, - "MD013": { "tables": false, "code_blocks": false }, - "MD026": false, - "no-hard-tabs": false -} \ No newline at end of file + "default": true, + "MD003": { "style": "atx" }, + "MD004": false, + "MD007": { "indent": 4 }, + "MD013": false, + "MD026": false, + "no-hard-tabs": false +} diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 0383ce913..1445852b0 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -1,6 +1,6 @@ # Exposing Hypermedia Information Graphs to LLMs -[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is enough meta-information in a hypermedia document that a semantic consumer can successfully traverse an information graph to find the information demanded by a user prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of the ability to derive meaningful insight with tools beyond basic text search. JSON and XML came out of that heritage, with structured query enabled by [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath), both assuming a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across documents. [Hypertext Application Language](https://en.wikipedia.org/wiki/Hypertext_Application_Language) is a standard implementation of this concept, enabling applications to expose a semantic space to a broad range of consumers, enabling navigation patterns beyond what a schema could reasonably describe. +[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is sufficient meta-information in a hypermedia document graph for a semantic consumer to successfully traverse it to find the information demanded by a user prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of inherent capacity to derive meaningful insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query supported by a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across resources. These approaches are integrated in this system to enable an LLM to search for and discover desired information across labeled nodes and edges within a graph. In a more traditional system, a schema is the pre-requisite to traversal, whereas in a semantic system, it is traversal that reveals the schema. > A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). @@ -8,23 +8,23 @@ In trail races, there are frequent ribbons hanging from trees and painted arrows This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }` or expose more descriptive complexity by separating `"link-relation": "deep-dive"` from `"target-kind": "gardening"`, to cleanly split the link-adjective and its target-noun. The better the implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. -Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Semantic graphs extend this idea with "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space—in both content and link relations—allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. +Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea with "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space -- both content and link relations -- allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. The semantic relationships and other meta-information are used as the basis of typical LLM mechanics like vector similarity, making hypermedia a kind of RAG scheme and suitable for static-webhost deployment. The concept of a pre-baked static hypermedia graph has been applied to the .NET release notes. That project was initially approached as a modern revamp of a set of JSON files that are used to inform and direct cloud-infra deployment and compliance workflows at scale. Over time, it became obvious that LLMs could read the same content directly and self-reason about its content and navigation patterns. Early experimentation proved that out. The primary techniques used to improve applicability for LLMs are semantic naming and graph-resident guidance. These techniques can be quite subtle, where a small shift in semantic bias can result in a large shift in LLM performance. -Graph-resident guidance consists of skills and workflows. HATEOAS tells us that "customer" can be a relation of a sales order. Why not make "graph-instructions" a relation of a graph? Skills and workflows are first-class relations in the graph, enabling graph designers to express navigation intent. Skills follow the Anthropic skill format, while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide the reader with the "ten-km-route" workflow if that's a match for the intended outcome. +Graph-resident guidance consists of skills and workflows. HATEOAS tells us that "customer" can be a relation of a sales order. Why not make "graph-instructions" a relation of a graph? Skills and workflows are first-class relations in the graph, enabling graph designers to express navigation intent. Skills follow the Anthropic skill format, while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide readers with "ten-km-route" style workflows if that's a match for the intended outcome. -The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL-native workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. And, of course, this approach has utility beyond release notes. +The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. And, of course, this approach has utility beyond release notes. ## Graph entrypoint tension -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support n-9s levels of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that requires significant re-evaluation and re-work to maintain high quality LLM enablement. These requirements are in firm opposition, needing a winner to pull ahead, a painful compromise, or some form of tie-break. +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that requires significant and quick re-evaluation and re-work to maintain high quality LLM enablement. These requirements are in firm opposition, needing a winner to pull ahead, a painful compromise, or some form of tie-break. -Compromises are no fun! Let's take the tie-break. We can instead view the core graph as a well-defined data-layer that honors the reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. +Let's take the tie-break. We can instead view the core graph as a well-defined data-layer that honors the reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. -We can compare the two entrypoints. +We can compare the embedded resource section of the two entrypoints. [Core graph entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json): @@ -43,7 +43,7 @@ We can compare the two entrypoints. }, ``` -That's how the core graph exposes a major version. As suggested, it's skeletal. The graph entrypoint only needs to be updates once or twice a year. Even if the file is regenerated daily, git won't notice any changes. +That's how the core graph exposes a major version. As suggested, it's skeletal. The graph entrypoint only needs to be updated once or twice a year. Even if the file is regenerated daily, git won't notice any changes. [LLM entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json): @@ -92,18 +92,24 @@ The LLM graph exposes a lot more useful information. The semantic data and link The strongest indicator of semantic design is that there are multiple relations for the same underlying resource. Both `latest-security-disclosures` and `latest-security-month` point to the same month index, but they offer different semantic pathways for discovering it. An LLM asking "what are the latest CVEs?" navigates one way; an LLM asking "what happened in October?" navigates another. Same destination, different semantic intent. -This approach enables both principles from earlier: +This approach is an implementation of principles described earlier: - "match for the intended outcome": the designer provides multiple semantic pathways for different query types - "match a key you know with a value you don't": the reader discovers the right pathway through semantic labels -## LLM entry point +The indexes also differ in terms of the nature of the information they contain. The core index is a zoomed out view of .NET versions released over (at the time of writing) a ten year period. They form the basic elements of any query. This is an objectively correct normalized entry point view of the graph. In contrast, the LLM index is the result of a query, revealing rich information about the most recent patches for supported major versions. It enables constructing the same queries as the core graph, but also includes enough data to serve as the results of queries, relating to the zoomed-in current moment. +The graph applies multiple focal lengths and pivots throughout to provide information that is useful and has good ergonomics for varying classes of queries and their consumers. This differentation is a core property of the graph, in part to serve the needs of expected consumers, but also to separate chains of graph nodes that should be skeletal vs those that should be weighted. +## Graph design -Two strong design principles emerged from intuition and then observed behavior from eval: + + +Two strong design principles emerged from observed LLM behavior from eval: - Consistently apply a semantic model throughout the graph. It's a comfort to find a concept where it is expected. - Expose resources in terms of structual kind, like `major` aand `-month`, and desired output, like `-security-disclosures`. This dual approach to semantic naming sometimes results in this double-mapping. Emperical observation suggests that LLMs prefer the outcome-based naming, while the more schema-correct and initial naming is the structual framing. + +Wormholes vs spear-fishing. \ No newline at end of file From f7c3e93574440c2579a851a38d3239b1f3ec6be3 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Thu, 1 Jan 2026 09:27:13 -0800 Subject: [PATCH 05/17] Add performance section --- .../exposing-hypermedia-to-llms.md | 160 +++++++++++++++++- 1 file changed, 153 insertions(+), 7 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 1445852b0..2632f937e 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -4,11 +4,11 @@ > A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). -In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 KM -> left" and "10 KM -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where the courses diverge is the extra descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you know with a value you don't. This signposting gets us closer to understanding and being able to explore the territory. +In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 KM -> left" and "10 KM -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where two courses diverage is the extra descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you recognize with a value you cannot predict. Signposting provides comprehension that enables directed navigation of the territory. -This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }` or expose more descriptive complexity by separating `"link-relation": "deep-dive"` from `"target-kind": "gardening"`, to cleanly split the link-adjective and its target-noun. The better the implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. +This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }` or expose more descriptive complexity by separating the parts, like `"link-relation": "deep-dive"` from `"target-kind": "gardening"`, cleanly splitting the link-adjective and its target-noun. The better the semantic implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. -Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea with "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space -- both content and link relations -- allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. +Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea enabling "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space -- both content and link relations -- allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. The semantic relationships and other meta-information are used as the basis of typical LLM mechanics like vector similarity, making hypermedia a kind of RAG scheme and suitable for static-webhost deployment. @@ -18,11 +18,155 @@ Graph-resident guidance consists of skills and workflows. HATEOAS tells us that The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. And, of course, this approach has utility beyond release notes. -## Graph entrypoint tension +## Graph design + +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that may require significant and quick re-evaluation and re-work to maintain high-quality LLM enablement. These modalities are in firm opposition. + +We can instead view the core graph as a well-defined data-layer that honors the desired reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. + +The graph as a whole is based on a largely traditional schema design, utilizing both normalized and denormalized approaches in (hopefully informed) service of consumers. After the graph was realized, it became possible to test it with `jq` as a sort of passive and syntactic consumer and with LLMs as a much more active and semantic consumer. The graph was successively (and hopefully successfully) adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback was fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a minor tilt towards semantic consumers. + +The choice of hypermedia as the grounding format is a case-in-point of the overall approach. Hypermedia long pre-dates LLMs, however, it has always held semantic consumers (humans) as a key design cohort. Hypermedia formats provide a conceptual framework that is easy to flavor towards semantic consumption. This flexibility proved useful as the design was adapted with LLM feedback. It should also be noted that LLM feedback is by far the cheapest and most accessible form of feedback. LLMs are happy to provide usage feedback throughout the night while other semantic consumers are sleeping. + +A few strong design principles emerged from observed LLM behavior from eval: + +- Consistent application of a conceptual model creates familiarity for semantic consumers. It is a comfort to find a concept exposed where it is expected. +- Resources can be dual-mapped in terms of structual kind, like `major` aand `-month`, and desired output, like `-security-disclosures`. Prompts can bias towards different concerns. Differentiated mappings are more to present a similar match to semantic consumers. +- LLMs operate on a model of scarcity, with tokens at a premium. Smaller graph nodes encourage greater graph navigation. Comprehension can be made to outperform consumption cost. +- LLMs will acquire multiple resources in a single turn if a good strategy for doing so is evident. + +### LLM entrypoints + +There are entypoints provided for LLMs: + +- [llms.txt](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) -- Prose explanation of how to use the graph, including a link to llms.json. +- [llms.json](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) -- The LLM index (AKA "application-layer entrypoint for LLMs"). It also includes guidance inline. + +[llms.txt](https://llmstxt.org/) is an emerging standard, with awareness in the most recently trained models. It can be used for meta-information (as is the case in this system) or to expose an index of all information available (as is the case with [Stripe docs](https://docs.stripe.com/llms.txt)). It's hard to imagine that the Stripe approach is optimal. It uses 18.5k tokens (10% of a typical token budget) while our use of `llms.txt` clocks in at a meager 626 tokens. + +A major advantage of `llms.txt` is that it is markdown, which offers a natural way to expose resource links, guidance, and foreign content (code fences). It is possible to include all the same information in JSON, however, it is awkward and (critically) unconventional. It takes a lot more design effort to get an LLM to notice and apply guidance from within a data-oriented JSON document than markdown, which has a much stronger association with guidance and multi-modality information. + +Our use of `llms.txt` includes an entrypoint link to `llms.json`, a table of skills content, and basic initial guidance. LLMs will often fetch the data URL and one or more skills files in a single turn. Fetching multiple documents in a single turn is a useful tactic for token optimization. + +## Performance implications + +Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turns, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired answer. The range of required turns varies greatly, a join of information design over data and comprehension of the overall information framework by the LLM. + +There is a cost function for LLMs based of the mechanics of the [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)). The cost of multiple turns can be prohibitive, resulting in conversation failures/termination, poor performance, or high cost. The graph design has a direct impact on LLM performance and cost. + +Let's develop an intuitive for cost. There are three major cost functions at play: + +- Token cost, which is additive across turns, and summed across turns +- Context, which is the accumulated token cost of the final turn +- Attention computation, which is the square of the tokens in any one turn and the sum of those squares across turns + +Let's simplify, assuming that tokens (`n`) are uniform per turn (`m`). + +Context (accumulated tokens): `mn` + +Total token cost (summed tokens / turn): ? + +Attention (number of interactions required): `n² * m³ / 3` + +I need the m turns table here. + +Total: `n²(1 + 4 + 9 + ... + m²)` = `n² × m³/3` + +What's actually happening: -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that requires significant and quick re-evaluation and re-work to maintain high quality LLM enablement. These requirements are in firm opposition, needing a winner to pull ahead, a painful compromise, or some form of tie-break. +`n²` is the quadratic attention factor (tokens attending to tokens) +`m³` emerges from summing squares across turns -Let's take the tie-break. We can instead view the core graph as a well-defined data-layer that honors the reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. +Intuitive model: + +Batched: (total tokens)² = (nm)² = n²m² — pure quadratic +Sequential: sum of growing quadratics = n²m³/3 — cubic in turns + +Batched is the cheapest form. It's the mode where all fetches can be collapsed into a single term. We escape the successive cost of attention. Sequential is the more realistic but costly model, where each fetch happens in sequence. The cost difference is very large. + +One can draw the conclusion that any system that requires the turn-by-turn approach is bad because it forces the sequential approach. That's not really a correct framing. Many problems require multiple turns. For example, many ChatGPT conversations go on for quite some time. A more useful framing is determining if there is a way to reduce turns or apply some other optimization. + +> The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. + +This design approach was selected in service of n-9s reliability and performance. It takes on a new light when viewed through various LLM cost functions. The formulas above assume that token count is uniform. If it isn't, then you'd want lean token counts up front to guide planning and navigation and then more weighty token counts at the end (at the hopefully final turn) used to derive an answer. The ability to fetch multiple documents in a single turn further optimizes the cost function. It helps us split the difference between the batched and seqential cost models. + +The rest of the design should be viewed in terms of these cost functions. It is to a large degree the whole game at play. + +## llms.txt + +The following are exerpts of the two files, enough to provide a sense of their approach. + +`llms.txt`: + +```markdown +# .NET Release Graph + +Machine-readable .NET release, CVE, and compatibility data via HAL hypermedia. + +## First Fetch — Do These in Parallel + +1. **Data**: +2. **Skill**: Pick ONE from the table below based on your query + +| Query About | Skill | +|-------------|-------| +| CVEs, security patches, CVSS | [cve-queries](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/SKILL.md) | +| Breaking changes, compatibility | [breaking-changes](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/breaking-changes/SKILL.md) | +| Version lifecycle, EOL dates | [version-eol](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/version-eol/SKILL.md) | +| General queries, unsure | [dotnet-releases](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md) | + +Fetch llms.json + your skill in the same turn. The skill points to workflows.json which has chained workflows with `next_workflow` transitions. + +## Core Rules + +1. Follow `_links` exactly — never construct URLs +2. Use `_embedded` data first — most queries need zero extra fetches +3. Match your query to a workflow, then follow its `follow_path` +4. Fetch multiple resources per turn when possible +``` + +`llms.json`: + +```json +{ + "kind": "llms", + "title": ".NET Release Index for AI", + "ai_note": "ALWAYS read required_pre_read first. Use skills and workflows when they match; they provide optimal paths. Trust _embedded data\u2014it\u0027s authoritative and current. Never construct URLs.", + "human_note": "No support or compatibility is offered for this file. Don\u0027t use it for automated workflows. Use index.json instead.", + "required_pre_read": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md", + "latest_major": "10.0", + "latest_lts_major": "10.0", + "latest_patch_date": "2025-12-09T00:00:00+00:00", + "latest_security_patch_date": "2025-10-14T00:00:00+00:00", + "last_updated_date": "2025-12-24T12:33:04.8560376+00:00", + "supported_major_releases": [ + "10.0", + "9.0", + "8.0" + ], + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json" + }, + "latest-lts-major": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json", + "title": "Latest LTS major release - .NET 10.0" + }, + "latest-major": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json", + "title": "Latest major release - .NET 10.0" + }, + "latest-month": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/12/index.json", + "title": "Latest month - December 2025" + }, + "latest-security-disclosures": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json", + "title": "Latest security disclosures - October 2025" + }, +``` + +## Graph entrypoint tension We can compare the embedded resource section of the two entrypoints. @@ -112,4 +256,6 @@ Two strong design principles emerged from observed LLM behavior from eval: This dual approach to semantic naming sometimes results in this double-mapping. Emperical observation suggests that LLMs prefer the outcome-based naming, while the more schema-correct and initial naming is the structual framing. -Wormholes vs spear-fishing. \ No newline at end of file +Wormholes vs spear-fishing. + +note: time is a challenge From 8a410cf813e90bffcdde5f108d6645edcb945199 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Thu, 1 Jan 2026 09:54:42 -0800 Subject: [PATCH 06/17] Update performance section --- .../exposing-hypermedia-to-llms.md | 146 +++++++++++++++--- 1 file changed, 121 insertions(+), 25 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 2632f937e..1ba1b2f2b 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -46,51 +46,147 @@ There are entypoints provided for LLMs: A major advantage of `llms.txt` is that it is markdown, which offers a natural way to expose resource links, guidance, and foreign content (code fences). It is possible to include all the same information in JSON, however, it is awkward and (critically) unconventional. It takes a lot more design effort to get an LLM to notice and apply guidance from within a data-oriented JSON document than markdown, which has a much stronger association with guidance and multi-modality information. -Our use of `llms.txt` includes an entrypoint link to `llms.json`, a table of skills content, and basic initial guidance. LLMs will often fetch the data URL and one or more skills files in a single turn. Fetching multiple documents in a single turn is a useful tactic for token optimization. +Our use of `llms.txt` includes an entrypoint link to the data entrypoint (`llms.json`), a table of skills content, and basic initial guidance. LLMs will often fetch the data URL and one or more skills files in a single turn. Fetching multiple documents in a single turn is a useful tactic for token optimization. -## Performance implications +### Performance implications -Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turns, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired answer. The range of required turns varies greatly, a join of information design over data and comprehension of the overall information framework by the LLM. +Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turn, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired answer. The range of required turns varies greatly, a join of information design over data and comprehension of the overall information framework by the LLM. -There is a cost function for LLMs based of the mechanics of the [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)). The cost of multiple turns can be prohibitive, resulting in conversation failures/termination, poor performance, or high cost. The graph design has a direct impact on LLM performance and cost. +There is a cost function for LLMs based on the mechanics of the [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)). The cost of multiple turns can be prohibitive, resulting in conversation failures/termination, poor performance, or high cost. The graph design has a direct impact on LLM performance and cost. -Let's develop an intuitive for cost. There are three major cost functions at play: +#### Cost model -- Token cost, which is additive across turns, and summed across turns -- Context, which is the accumulated token cost of the final turn -- Attention computation, which is the square of the tokens in any one turn and the sum of those squares across turns +There are three major cost functions at play: -Let's simplify, assuming that tokens (`n`) are uniform per turn (`m`). +- **Token cost:** The tokens processed at each turn, summed across turns. Each turn reprocesses all prior context plus new content. +- **Context:** The accumulated tokens at the final turn. This is bounded by the model's context window. +- **Attention:** Each token attends to every other token within a turn (quadratic), and this cost is incurred at every turn as context grows. -Context (accumulated tokens): `mn` +Let's build intuition with uniform token counts: `n` tokens added per turn across `m` turns. New tokens being uniform is a simplification. -Total token cost (summed tokens / turn): ? +| Turn | New tokens | Tokens | Context size | Attention cost | Accumulated token cost | Accumulated attention cost | +|------|------------|--------|--------------|----------------|------------------------|---------------------------| +| 1 | n | n | n | n² | n | n² | +| 2 | n | 2n | 2n | 4n² | 3n | 5n² | +| 3 | n | 3n | 3n | 9n² | 6n | 14n² | +| 4 | n | 4n | 4n | 16n² | 10n | 30n² | +| 5 | n | 5n | 5n | 25n² | 15n | 55n² | +| m | n | mn | mn | m²n² | nm(m+1)/2 | n²m(m+1)(2m+1)/6 | -Attention (number of interactions required): `n² * m³ / 3` +The formulas simplify for large m: -I need the m turns table here. +| Measure | Formula | Growth class | +|---------|---------|--------------| +| Final context | mn | Linear in turns | +| Total token cost | nm²/2 | Quadratic in turns | +| Total attention | n²m³/3 | Cubic in turns | -Total: `n²(1 + 4 + 9 + ... + m²)` = `n² × m³/3` +The cubic growth in attention is the dominant cost. It emerges from summing quadratic costs across turns—each turn pays attention on everything accumulated so far. -What's actually happening: +### Batched vs sequential -`n²` is the quadratic attention factor (tokens attending to tokens) -`m³` emerges from summing squares across turns +Consider an alternative: what if all content could be fetched in a single turn? -Intuitive model: +| Approach | Total attention cost | Relative cost | +|----------|---------------------|---------------| +| Batched (1 turn) | (nm)² = n²m² | 1× | +| Sequential (m turns) | n²m³/3 | m/3 × | -Batched: (total tokens)² = (nm)² = n²m² — pure quadratic -Sequential: sum of growing quadratics = n²m³/3 — cubic in turns +The sequential penalty is approximately **m/3** compared to batched. Ten turns costs roughly 3× what a single batched turn would; thirty turns costs roughly 10×. This ratio scales linearly with turn count. -Batched is the cheapest form. It's the mode where all fetches can be collapsed into a single term. We escape the successive cost of attention. Sequential is the more realistic but costly model, where each fetch happens in sequence. The cost difference is very large. +Many problems genuinely require multiple turns—the LLM must reason about intermediate results before knowing what to fetch next. The goal is not to eliminate turns but to minimize them and optimize their structure. -One can draw the conclusion that any system that requires the turn-by-turn approach is bad because it forces the sequential approach. That's not really a correct framing. Many problems require multiple turns. For example, many ChatGPT conversations go on for quite some time. A more useful framing is determining if there is a way to reduce turns or apply some other optimization. +### Optimization: lean early, heavy late -> The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. +The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. -This design approach was selected in service of n-9s reliability and performance. It takes on a new light when viewed through various LLM cost functions. The formulas above assume that token count is uniform. If it isn't, then you'd want lean token counts up front to guide planning and navigation and then more weighty token counts at the end (at the hopefully final turn) used to derive an answer. The ability to fetch multiple documents in a single turn further optimizes the cost function. It helps us split the difference between the batched and seqential cost models. +Consider two orderings for the same total content—6 small documents (100 tokens each) and 3 large documents (500 tokens each): -The rest of the design should be viewed in terms of these cost functions. It is to a large degree the whole game at play. +**Large documents first:** + +| Turn | New tokens | Context | Attention | Accumulated attention | +|------|------------|---------|-----------|-----------------------| +| 1 | 500 | 500 | 250K | 250K | +| 2 | 500 | 1000 | 1,000K | 1,250K | +| 3 | 500 | 1500 | 2,250K | 3,500K | +| 4 | 100 | 1600 | 2,560K | 6,060K | +| ... | ... | ... | ... | ... | +| 9 | 100 | 2100 | 4,410K | **18,970K** | + +**Small documents first:** + +| Turn | New tokens | Context | Attention | Accumulated attention | +|------|------------|---------|-----------|----------------------| +| 1 | 100 | 100 | 10K | 10K | +| 2 | 100 | 200 | 40K | 50K | +| ... | ... | ... | ... | ... | +| 6 | 100 | 600 | 360K | 910K | +| 7 | 500 | 1100 | 1,210K | 2,120K | +| 8 | 500 | 1600 | 2,560K | 4,680K | +| 9 | 500 | 2100 | 4,410K | **9,090K** | + +Same content, same turn count, but ordering alone yields a **2× cost difference**. The principle: defer large token loads to later turns where possible. + +### Optimization: multiple fetches per turn + +The sequential model assumes one fetch per turn. LLMs can fetch multiple documents in a single turn when given clear guidance about what to retrieve. + +This is where graph design directly impacts cost and a graph designer can coerse a sequential paradigm to approach batched cost. + +The goal (or opportunity) is to get an LLM to: + +1. Navigate lean index documents in early turns to identify targets +2. Fetch multiple (weighty) target documents in the last turn minus one +3. Synthesize the answer in the final turn + +**Observed pattern from eval:** Given well-structured graph navigation hints, LLMs reliably discover a set of candidate documents in one turn, then fetch all of them together in the next turn. This collapses what might be many sequential fetches into a small number of turns, dramatically reducing the attention cost. + +The following eval trace demonstrates the pattern. The prompt asked the LLM to analyze CVE fix patterns across .NET releases: + +| Turn | Documents fetched | Purpose | +|------|-------------------|---------| +| 1 | `llms.txt` | Entrypoint discovery | +| 2 | `llms.json`, `cve-queries/SKILL.md` | Graph orientation + skill acquisition | +| 3 | `workflows.json` | Navigation strategy | +| 4 | `2024/index.json`, `2025/index.json` | Timeline discovery (2 fetches) | +| 5 | `2024/11/cve.json`, `2025/01/cve.json`, `2025/03/cve.json`, `2025/04/cve.json`, `2025/05/cve.json`, `2025/06/cve.json` | CVE data collection (6 fetches) | +| 6 | 6 GitHub `.diff` files | Commit analysis (6 fetches) | + +The raw fetch list: + +``` +1. llms.txt (turn 1) +2. llms.json (turn 2) +3. cve-queries/SKILL.md (turn 2) +4. workflows.json (turn 3) +5. 2024/index.json (turn 4) +6. 2025/index.json (turn 4) +7. 2024/11/cve.json (turn 5) +8. 2025/01/cve.json (turn 5) +9. 2025/03/cve.json (turn 5) +10. 2025/04/cve.json (turn 5) +11. 2025/05/cve.json (turn 5) +12. 2025/06/cve.json (turn 5) +13-18. Six GitHub .diff files (turn 6) +``` + +18 documents retrieved across 6 turns. A naive sequential approach would require 18 turns. The multi-fetch pattern reduced turn count by 3×, which translates to roughly **6× reduction in attention cost** (since the sequential penalty scales as m/3). + +Note the progression: documents get progressively larger through the trace. The `llms.txt` entrypoint is tiny. The index files are small. The CVE JSON files are medium. The `.diff` files at the end are the largest. This is the "lean early, heavy late" principle in action as a design intention. + +The entrypoint design—skeletal and rarely changing—takes on new significance in this light. A lean entrypoint enables rapid initial orientation with minimal attention cost. Subsequent navigation through lightweight index nodes preserves token budget for the final multi-fetch turn where the more information and answer dense content is gathered. + +#### Design implications + +The cost model suggests several design principles: + +- **Minimize turn count**: through clear navigation affordances. Each eliminated turn saves quadratically growing attention cost. +- **Front-load lightweight content**: Index documents, link relations, and navigation hints should be small. Substantive content belongs at the leaves. +- **Enable multi-fetch patterns**: Expose document collections as lists of links rather than embedded content, allowing LLMs to batch their retrieval. +- **Provide explicit workflows**: Graph-resident guidance can direct LLMs to optimal traversal patterns, encoding the designer's knowledge of efficient paths. +- **Ensure sufficient guidance to avoid hallucinations**: The effectiveness of the approach decays quickly if an LLM loses confidence in the hints or is unsure how to proceed along the path. + +The rest of the design should be viewed through this cost lens. It is to a large degree the whole game at play. ## llms.txt From a65fbd55e6479689a2521922c6d68c7ca5b25213 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Fri, 2 Jan 2026 21:32:41 -0800 Subject: [PATCH 07/17] Update performance section --- .../exposing-hypermedia-to-llms.md | 184 +++++++++--------- 1 file changed, 91 insertions(+), 93 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 1ba1b2f2b..4d9e34c5c 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -18,74 +18,79 @@ Graph-resident guidance consists of skills and workflows. HATEOAS tells us that The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. And, of course, this approach has utility beyond release notes. -## Graph design +## Graph design point -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but far from ideal. The restrictive idea of the core graph is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that may require significant and quick re-evaluation and re-work to maintain high-quality LLM enablement. These modalities are in firm opposition. +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but not ideal. The motivation for the restrictive approach is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that may require significant and quick re-evaluation and re-work of the entrypoint to maintain high-quality LLM enablement. These modalities are in firm opposition. We can instead view the core graph as a well-defined data-layer that honors the desired reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. -The graph as a whole is based on a largely traditional schema design, utilizing both normalized and denormalized approaches in (hopefully informed) service of consumers. After the graph was realized, it became possible to test it with `jq` as a sort of passive and syntactic consumer and with LLMs as a much more active and semantic consumer. The graph was successively (and hopefully successfully) adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback was fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a minor tilt towards semantic consumers. +The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in (hopefully) informed service of consumers. After the graph was realized, it became possible to test it with `jq` as a sort of passive and syntactic consumer and with LLMs as a much more active and semantic consumer. The graph was successively and successfully adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. -The choice of hypermedia as the grounding format is a case-in-point of the overall approach. Hypermedia long pre-dates LLMs, however, it has always held semantic consumers (humans) as a key design cohort. Hypermedia formats provide a conceptual framework that is easy to flavor towards semantic consumption. This flexibility proved useful as the design was adapted with LLM feedback. It should also be noted that LLM feedback is by far the cheapest and most accessible form of feedback. LLMs are happy to provide usage feedback throughout the night while other semantic consumers are sleeping. +The choice of hypermedia as the grounding format is a case-in-point of the overall approach. Hypermedia long pre-dates LLMs, however, it has always held semantic consumers (humans) as a key design cohort. Hypermedia formats provide a conceptual framework that is easy to flavor towards semantic consumption. This flexibility proved useful as the design was adapted with LLM feedback. It should also be noted that LLM feedback is by far the cheapest and most accessible form of feedback. LLMs are happy to provide usage feedback in response to iterative adaptation and at any time of day or night. A few strong design principles emerged from observed LLM behavior from eval: - Consistent application of a conceptual model creates familiarity for semantic consumers. It is a comfort to find a concept exposed where it is expected. -- Resources can be dual-mapped in terms of structual kind, like `major` aand `-month`, and desired output, like `-security-disclosures`. Prompts can bias towards different concerns. Differentiated mappings are more to present a similar match to semantic consumers. -- LLMs operate on a model of scarcity, with tokens at a premium. Smaller graph nodes encourage greater graph navigation. Comprehension can be made to outperform consumption cost. +- It is possible to expose links that jump from one part of the graph to another, like a wormhole. LLMs seem to need to develop _comprehension_ and _trust_ as a pre-requisite for relying on them. The more attractive the wormhole link, the more the LLM may be skeptical. This was observed most with the `latest-security-disclosures` relation since it provides high value and because the it has an inherent half-life. +- Resources can be dual-mapped in terms of structural kind, like `latest-security-month`, and desired output, like `latest-security-disclosures`. A given prompt may bias towards different concerns. Differentiated mappings are more likely to present a similar match to semantic consumers. - LLMs will acquire multiple resources in a single turn if a good strategy for doing so is evident. +- LLMs operate on a model of scarcity, with tokens at a premium. Smaller graph nodes encourage greater graph navigation by creating a sense that growing comprehension is outstripping consumption cost. +- Differentiating token cost by category of nodes makes it cheaper for LLMs to navigate a large graph. The `month` node with the graph is weightier than all other nodes making it easier to develop an exploration plan among other nodes before making a final decision on which month(s) to read or to skip months altogether and to prefer to exit the graph (with a graph exit link), for example, to read our monthly `cve.json` files. ### LLM entrypoints -There are entypoints provided for LLMs: +There are two entrypoints provided for LLMs: - [llms.txt](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) -- Prose explanation of how to use the graph, including a link to llms.json. - [llms.json](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) -- The LLM index (AKA "application-layer entrypoint for LLMs"). It also includes guidance inline. -[llms.txt](https://llmstxt.org/) is an emerging standard, with awareness in the most recently trained models. It can be used for meta-information (as is the case in this system) or to expose an index of all information available (as is the case with [Stripe docs](https://docs.stripe.com/llms.txt)). It's hard to imagine that the Stripe approach is optimal. It uses 18.5k tokens (10% of a typical token budget) while our use of `llms.txt` clocks in at a meager 626 tokens. +[llms.txt](https://llmstxt.org/) is an emerging standard, with awareness in the most recently trained LLMs. It can be used for meta-information (as is the case in this system) or to expose an index of all information available (as is the case with [Stripe docs](https://docs.stripe.com/llms.txt)). It's hard to imagine that the Stripe approach is optimal. It uses 18.5k tokens (10% of a typical token budget) while our use of `llms.txt` clocks in at a meager 609 tokens. A major advantage of `llms.txt` is that it is markdown, which offers a natural way to expose resource links, guidance, and foreign content (code fences). It is possible to include all the same information in JSON, however, it is awkward and (critically) unconventional. It takes a lot more design effort to get an LLM to notice and apply guidance from within a data-oriented JSON document than markdown, which has a much stronger association with guidance and multi-modality information. Our use of `llms.txt` includes an entrypoint link to the data entrypoint (`llms.json`), a table of skills content, and basic initial guidance. LLMs will often fetch the data URL and one or more skills files in a single turn. Fetching multiple documents in a single turn is a useful tactic for token optimization. -### Performance implications +## Performance considerations -Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turn, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired answer. The range of required turns varies greatly, a join of information design over data and comprehension of the overall information framework by the LLM. +Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turn, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired final answer. The range of required turns varies greatly, a join of information design over the body of data and comprehension of the overall information framework by the LLM. -There is a cost function for LLMs based on the mechanics of the [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)). The cost of multiple turns can be prohibitive, resulting in conversation failures/termination, poor performance, or high cost. The graph design has a direct impact on LLM performance and cost. +There is a cost function for LLMs based on the mechanics of the [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)). The graph design has a direct impact on LLM performance and cost. Multiple turns accelerate costs extremely quickly, much faster than intuition would suggest. -#### Cost model +### Cost model There are three major cost functions at play: - **Token cost:** The tokens processed at each turn, summed across turns. Each turn reprocesses all prior context plus new content. -- **Context:** The accumulated tokens at the final turn. This is bounded by the model's context window. +- **Context:** The accumulated tokens at the final turn (could equally be called "terminal turn" if there is a "context overflow"). This is bounded by the model's context window. - **Attention:** Each token attends to every other token within a turn (quadratic), and this cost is incurred at every turn as context grows. -Let's build intuition with uniform token counts: `n` tokens added per turn across `m` turns. New tokens being uniform is a simplification. +Let's build intuition using uniform token counts: `n` tokens added per turn across `m` turns. -| Turn | New tokens | Tokens | Context size | Attention cost | Accumulated token cost | Accumulated attention cost | -|------|------------|--------|--------------|----------------|------------------------|---------------------------| -| 1 | n | n | n | n² | n | n² | -| 2 | n | 2n | 2n | 4n² | 3n | 5n² | -| 3 | n | 3n | 3n | 9n² | 6n | 14n² | -| 4 | n | 4n | 4n | 16n² | 10n | 30n² | -| 5 | n | 5n | 5n | 25n² | 15n | 55n² | -| m | n | mn | mn | m²n² | nm(m+1)/2 | n²m(m+1)(2m+1)/6 | +| Turn | New tokens | Conversation tokens | Context size | Accumulated token cost | Attention cost | Accumulated attention cost | +|------|------------|---------------------|--------------|------------------------|----------------|---------------------------| +| 1 | n | n | n | n | n² | n² | +| 2 | n | 2n | 2n | 3n | 4n² | 5n² | +| 3 | n | 3n | 3n | 6n | 9n² | 14n² | +| 4 | n | 4n | 4n | 10n | 16n² | 30n² | +| 5 | n | 5n | 5n | 15n | 25n² | 55n² | +| m | n | mn | mn | nm(m+1)/2 | m²n² | n²m(m+1)(2m+1)/6 | The formulas simplify for large m: | Measure | Formula | Growth class | |---------|---------|--------------| | Final context | mn | Linear in turns | -| Total token cost | nm²/2 | Quadratic in turns | -| Total attention | n²m³/3 | Cubic in turns | +| Accumulated token cost | nm²/2 | Quadratic in turns | +| Accumulated attention | n²m³/3 | Cubic in turns | + +More context on cost: -The cubic growth in attention is the dominant cost. It emerges from summing quadratic costs across turns—each turn pays attention on everything accumulated so far. +- API pricing is in term if tokens. For a multi-turn conversation, the cost is the accumulated token cost not the final context. +- The cubic growth in attention is the dominant computational cost. It emerges from summing quadratic costs across turns—each turn pays attention on everything accumulated so far. ### Batched vs sequential -Consider an alternative: what if all content could be fetched in a single turn? +Consider an alternative for the attention cost: what if all content could be fetched in a single turn? | Approach | Total attention cost | Relative cost | |----------|---------------------|---------------| @@ -98,95 +103,88 @@ Many problems genuinely require multiple turns—the LLM must reason about inter ### Optimization: lean early, heavy late -The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. - -Consider two orderings for the same total content—6 small documents (100 tokens each) and 3 large documents (500 tokens each): - -**Large documents first:** - -| Turn | New tokens | Context | Attention | Accumulated attention | -|------|------------|---------|-----------|-----------------------| -| 1 | 500 | 500 | 250K | 250K | -| 2 | 500 | 1000 | 1,000K | 1,250K | -| 3 | 500 | 1500 | 2,250K | 3,500K | -| 4 | 100 | 1600 | 2,560K | 6,060K | -| ... | ... | ... | ... | ... | -| 9 | 100 | 2100 | 4,410K | **18,970K** | +The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. The principle: defer large token loads to later turns where possible to reduce the number of turns that must pay the cost of large token loads. -**Small documents first:** - -| Turn | New tokens | Context | Attention | Accumulated attention | -|------|------------|---------|-----------|----------------------| -| 1 | 100 | 100 | 10K | 10K | -| 2 | 100 | 200 | 40K | 50K | -| ... | ... | ... | ... | ... | -| 6 | 100 | 600 | 360K | 910K | -| 7 | 500 | 1100 | 1,210K | 2,120K | -| 8 | 500 | 1600 | 2,560K | 4,680K | -| 9 | 500 | 2100 | 4,410K | **9,090K** | +### Optimization: multiple fetches per turn -Same content, same turn count, but ordering alone yields a **2× cost difference**. The principle: defer large token loads to later turns where possible. +The sequential model assumes one fetch per turn. LLMs can fetch multiple documents in a single turn when given clear guidance about what to retrieve. This approach tames the rate at which the attention cost accumulates, enabling a cost profiles that approaches batches while maintaining a sequence. The principle: prefer larger token loads per turn to reduce the number of turns overall. -### Optimization: multiple fetches per turn +### Concrete application -The sequential model assumes one fetch per turn. LLMs can fetch multiple documents in a single turn when given clear guidance about what to retrieve. +LLM eval of the graph demonstrates that effective design can result in optimal behavior. -This is where graph design directly impacts cost and a graph designer can coerse a sequential paradigm to approach batched cost. +Observed pattern: -The goal (or opportunity) is to get an LLM to: +1. Navigate lean index documents in early turns to identify graph paths +1. Fetch multiple graph documents in middle turns to parallelize multiple navigation paths +1. Fetch multiple information-dense documents in later/last turns to inform final answer +1. Synthesize the answer in the final turn -1. Navigate lean index documents in early turns to identify targets -2. Fetch multiple (weighty) target documents in the last turn minus one -3. Synthesize the answer in the final turn +The following eval trace demonstrates this behavior. The prompt asked the LLM to analyze CVE fix patterns across .NET releases: -**Observed pattern from eval:** Given well-structured graph navigation hints, LLMs reliably discover a set of candidate documents in one turn, then fetch all of them together in the next turn. This collapses what might be many sequential fetches into a small number of turns, dramatically reducing the attention cost. +> Prompt: Please look at .NET Runtime and ASP.NET Core CVEs from November 2024 until April 2025 (6 months). I am concerned at the rate of these CVEs. Look at code diffs for the CVEs. Are the fixes sufficiently protecting my mission critical apps and could the .NET team have avoided these vulnerabilities with a stronger security process? Fetch code diffs to inform your analysis. Ensure they are from dotnet/runtime or dotnet/aspnetcore. Include the repo and commit link in your analysis of specific CVEs in your report. -The following eval trace demonstrates the pattern. The prompt asked the LLM to analyze CVE fix patterns across .NET releases: +| Turn | Documents | Tokens | Cumulative | Purpose | +|------|-----------|--------|------------|---------| +| 1 | 1 | 609 | 609 | Entrypoint discovery | +| 2 | 2 | 2,323 | 2,932 | Graph orientation + skill acquisition | +| 3 | 1 | 1,146 | 4,078 | Navigation strategy | +| 4 | 2 | 3,374 | 7,452 | Timeline discovery | +| 5 | 6 | 12,131 | 19,583 | CVE data collection | +| 6 | 6 | 59,832 | 79,415 | Commit analysis | -| Turn | Documents fetched | Purpose | -|------|-------------------|---------| -| 1 | `llms.txt` | Entrypoint discovery | -| 2 | `llms.json`, `cve-queries/SKILL.md` | Graph orientation + skill acquisition | -| 3 | `workflows.json` | Navigation strategy | -| 4 | `2024/index.json`, `2025/index.json` | Timeline discovery (2 fetches) | -| 5 | `2024/11/cve.json`, `2025/01/cve.json`, `2025/03/cve.json`, `2025/04/cve.json`, `2025/05/cve.json`, `2025/06/cve.json` | CVE data collection (6 fetches) | -| 6 | 6 GitHub `.diff` files | Commit analysis (6 fetches) | +The token distribution is striking: **75% of all tokens arrive in the final turn**. This is the "lean early, heavy late" principle in action—not by accident, but by design. -The raw fetch list: +The raw fetch list with token counts: ``` -1. llms.txt (turn 1) -2. llms.json (turn 2) -3. cve-queries/SKILL.md (turn 2) -4. workflows.json (turn 3) -5. 2024/index.json (turn 4) -6. 2025/index.json (turn 4) -7. 2024/11/cve.json (turn 5) -8. 2025/01/cve.json (turn 5) -9. 2025/03/cve.json (turn 5) -10. 2025/04/cve.json (turn 5) -11. 2025/05/cve.json (turn 5) -12. 2025/06/cve.json (turn 5) -13-18. Six GitHub .diff files (turn 6) +Turn 1 (609 tokens): + llms.txt 609 tokens + +Turn 2 (2,323 tokens): + llms.json 2,126 tokens + cve-queries/SKILL.md 197 tokens + +Turn 3 (1,146 tokens): + workflows.json 1,146 tokens + +Turn 4 (3,374 tokens): + 2024/index.json 1,765 tokens + 2025/index.json 1,609 tokens + +Turn 5 (12,131 tokens): + 2024/11/cve.json 1,656 tokens + 2025/01/cve.json 4,020 tokens + 2025/03/cve.json 1,155 tokens + 2025/04/cve.json 1,034 tokens + 2025/05/cve.json 3,081 tokens + 2025/06/cve.json 1,185 tokens + +Turn 6 (59,832 tokens): + dotnet/runtime d16f41a.diff 37,425 tokens + dotnet/runtime 9da8c6a.diff 1,781 tokens + dotnet/runtime 89ef51c.diff 260 tokens + dotnet/aspnetcore 67f3b04.diff 1,669 tokens + dotnet/aspnetcore d6605eb.diff 15,388 tokens + dotnet/runtime b33d4e3.diff 3,309 tokens ``` -18 documents retrieved across 6 turns. A naive sequential approach would require 18 turns. The multi-fetch pattern reduced turn count by 3×, which translates to roughly **6× reduction in attention cost** (since the sequential penalty scales as m/3). +> Note: The eval harness truncated `.diff` files to 50 lines to ensure test completion across all configurations. The token counts above reflect actual document sizes—what a reader would encounter following the [published guidance](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt). -Note the progression: documents get progressively larger through the trace. The `llms.txt` entrypoint is tiny. The index files are small. The CVE JSON files are medium. The `.diff` files at the end are the largest. This is the "lean early, heavy late" principle in action as a design intention. +18 documents retrieved across 6 turns. A naive sequential approach would require 18 turns. The multi-fetch pattern reduced turn count by 3×, which translates to roughly **6× reduction in attention cost** (since the sequential penalty scales as m/3). -The entrypoint design—skeletal and rarely changing—takes on new significance in this light. A lean entrypoint enables rapid initial orientation with minimal attention cost. Subsequent navigation through lightweight index nodes preserves token budget for the final multi-fetch turn where the more information and answer dense content is gathered. +The entrypoint design—skeletal and rarely changing—takes on new significance in this light. A lean entrypoint enables rapid initial orientation with minimal attention cost. Subsequent navigation through lightweight index nodes preserves token budget for the final multi-fetch turn where substantive content is gathered. -#### Design implications +### Design implications The cost model suggests several design principles: -- **Minimize turn count**: through clear navigation affordances. Each eliminated turn saves quadratically growing attention cost. -- **Front-load lightweight content**: Index documents, link relations, and navigation hints should be small. Substantive content belongs at the leaves. -- **Enable multi-fetch patterns**: Expose document collections as lists of links rather than embedded content, allowing LLMs to batch their retrieval. -- **Provide explicit workflows**: Graph-resident guidance can direct LLMs to optimal traversal patterns, encoding the designer's knowledge of efficient paths. -- **Ensure sufficient guidance to avoid hallucinations**: The effectiveness of the approach decays quickly if an LLM loses confidence in the hints or is unsure how to proceed along the path. +- **Minimize turn count** through clear navigation affordance. Each eliminated turn saves quadratically growing attention cost. +- **Front-load lightweight content.** Index documents, link relations, and navigation hints should be small. Substantive content belongs at the leaves. +- **Enable multi-fetch patterns.** Expose document collections as lists of links rather than embedded content, encouraging LLMs to batch their retrieval. +- **Provide explicit workflows.** Graph-resident guidance can direct LLMs to optimal traversal patterns, encoding the designer's knowledge of efficient paths. -The rest of the design should be viewed through this cost lens. It is to a large degree the whole game at play. +The rest of the design should be viewed through this cost lens. As an application designer, there are only so many degrees of freedom. We cannot change LLM fundamentals but need to work within their constraints. To a large degree, optimizations like reducing turns are similar to loop variable hoisting. While LLMs are new and different, old school performance strategies remain effective. ## llms.txt From 5d05f868c844a841d952b7c0b93c8bc080076c9b Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Sun, 4 Jan 2026 10:36:57 -0800 Subject: [PATCH 08/17] Add entrypoint section --- .../exposing-hypermedia-to-llms.md | 574 ++++++++++++++---- 1 file changed, 440 insertions(+), 134 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 4d9e34c5c..51349d66b 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -1,198 +1,168 @@ # Exposing Hypermedia Information Graphs to LLMs -[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is sufficient meta-information in a hypermedia document graph for a semantic consumer to successfully traverse it to find the information demanded by a user prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of inherent capacity to derive meaningful insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query supported by a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across resources. These approaches are integrated in this system to enable an LLM to search for and discover desired information across labeled nodes and edges within a graph. In a more traditional system, a schema is the pre-requisite to traversal, whereas in a semantic system, it is traversal that reveals the schema. +[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is sufficient meta-information in a hypermedia document graph for a semantic consumer to successfully traverse it to find the information demanded by a prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of inherent capacity to derive meaningful insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query supported by a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across resources. These approaches are integrated in this system to enable an LLM to search for and discover desired information across labeled nodes and edges within a graph. In a more traditional system, a schema is the pre-requisite to traversal, whereas in a semantic system, it is traversal that reveals the schema. > A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). -In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 KM -> left" and "10 KM -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where two courses diverage is the extra descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you recognize with a value you cannot predict. Signposting provides comprehension that enables directed navigation of the territory. +In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 km -> left" and "10 km -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where two courses diverge is the HATEOAS-like descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you recognize with a value you need to know. Signposting enables directed navigation of the territory. -This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }` or expose more descriptive complexity by separating the parts, like `"link-relation": "deep-dive"` from `"target-kind": "gardening"`, cleanly splitting the link-adjective and its target-noun. The better the semantic implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. +This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }`. It might offer more descriptive complexity by separating the link parts, like `"link-relation": "deep-dive"` and `"target-kind": "gardening"`, cleanly splitting the link-adjective and its target-noun. The better the semantic implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. -Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea enabling "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space -- both content and link relations -- allows readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. +Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea enabling "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space including both content and link relations enables readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. -Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. The semantic relationships and other meta-information are used as the basis of typical LLM mechanics like vector similarity, making hypermedia a kind of RAG scheme and suitable for static-webhost deployment. +Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. The semantic relationships and other meta-information are used as the basis of typical LLM mechanics like vector similarity, making hypermedia a kind of [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) scheme and suitable for static-webhost deployment. The concept of a pre-baked static hypermedia graph has been applied to the .NET release notes. That project was initially approached as a modern revamp of a set of JSON files that are used to inform and direct cloud-infra deployment and compliance workflows at scale. Over time, it became obvious that LLMs could read the same content directly and self-reason about its content and navigation patterns. Early experimentation proved that out. The primary techniques used to improve applicability for LLMs are semantic naming and graph-resident guidance. These techniques can be quite subtle, where a small shift in semantic bias can result in a large shift in LLM performance. -Graph-resident guidance consists of skills and workflows. HATEOAS tells us that "customer" can be a relation of a sales order. Why not make "graph-instructions" a relation of a graph? Skills and workflows are first-class relations in the graph, enabling graph designers to express navigation intent. Skills follow the Anthropic skill format, while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide readers with "ten-km-route" style workflows if that's a match for the intended outcome. +Graph-resident guidance consists of skills and workflows. HATEOAS tells us that `customer` can be a relation of a sales order. Why not make `graph-instructions` a relation of a graph? Skills and workflows are first-class relations in the graph, enabling its designer to express navigation intent. Skills follow the [Anthropic skills format](https://github.com/anthropics/skills), while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide readers with "ten-km-route-follow-path" style workflows if that's a match for the intended outcome. -The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. And, of course, this approach has utility beyond release notes. +The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. This approach should scale just as well to a broad set of document types, not just release notes. + +This document is structured theory-first. The theory describes and demonstrates the basis for the design. The practice follows, intended as a faithful application of the theory. ## Graph design point -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but not ideal. The motivation for the restrictive approach is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that may require significant and quick re-evaluation and re-work of the entrypoint to maintain high-quality LLM enablement. These modalities are in firm opposition. +The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but not ideal. The motivation for the restrictive approach is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that may require quick re-evaluation and re-work (potentially breaking changes) of the entrypoint to maintain high-quality LLM enablement. These modalities are in firm opposition. -We can instead view the core graph as a well-defined data-layer that honors the desired reliability requirements, while exposing a separate application-layer entrypoint for LLMs that can evolve over time without the heavy compatibility burden. +Instead, we can instead view the core graph as a **well-defined data-layer** that honors the desired reliability requirements, while exposing a separate **adaptable application-layer** entrypoint for LLMs that can evolve over time without the heavy compatibility burden. -The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in (hopefully) informed service of consumers. After the graph was realized, it became possible to test it with `jq` as a sort of passive and syntactic consumer and with LLMs as a much more active and semantic consumer. The graph was successively and successfully adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. +The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in service of consumers. After the graph was realized, it became possible to test it with `jq` as a sort of passive and syntactic consumer and with LLMs as a much more active and semantic consumer. The graph was successively adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. -The choice of hypermedia as the grounding format is a case-in-point of the overall approach. Hypermedia long pre-dates LLMs, however, it has always held semantic consumers (humans) as a key design cohort. Hypermedia formats provide a conceptual framework that is easy to flavor towards semantic consumption. This flexibility proved useful as the design was adapted with LLM feedback. It should also be noted that LLM feedback is by far the cheapest and most accessible form of feedback. LLMs are happy to provide usage feedback in response to iterative adaptation and at any time of day or night. +The choice of hypermedia as the grounding format is a case-in-point of the overall approach. Hypermedia long pre-dates LLMs, however, it has always held semantic consumers (humans) as a key design cohort. Hypermedia formats provide a conceptual framework that is easy to flavor towards semantic consumption. This flexibility proved useful as the design was adapted with LLM feedback. It should also be noted that LLM feedback is by far the cheapest and most accessible form of feedback. -A few strong design principles emerged from observed LLM behavior from eval: +A few behavioral patterns emerged from LLM eval: - Consistent application of a conceptual model creates familiarity for semantic consumers. It is a comfort to find a concept exposed where it is expected. -- It is possible to expose links that jump from one part of the graph to another, like a wormhole. LLMs seem to need to develop _comprehension_ and _trust_ as a pre-requisite for relying on them. The more attractive the wormhole link, the more the LLM may be skeptical. This was observed most with the `latest-security-disclosures` relation since it provides high value and because the it has an inherent half-life. -- Resources can be dual-mapped in terms of structural kind, like `latest-security-month`, and desired output, like `latest-security-disclosures`. A given prompt may bias towards different concerns. Differentiated mappings are more likely to present a similar match to semantic consumers. +- It is possible to expose links that jump from one part of the graph to another, like a wormhole. LLMs seem to need to develop **comprehension** _and_ **trust** as a pre-requisite for relying on them. The more attractive the wormhole link, the more the LLM may be skeptical. This was observed most with the `latest-security-disclosures` relation since it provides high value and because it has an inherent half-life. The meaning of the relation was _so well understood_ that LLMs repeatedly felt the need to double check the correctness of the link value. +- Resources can be dual-mapped in terms of structural kind, like `latest-security-month`, and desired output, like `latest-security-disclosures`. A given prompt may bias towards different concerns. Differentiated mappings are more likely to present a similar match to semantic consumers. One expects that this can be overdone. This technique was applied with a light touch in the graph. - LLMs will acquire multiple resources in a single turn if a good strategy for doing so is evident. - LLMs operate on a model of scarcity, with tokens at a premium. Smaller graph nodes encourage greater graph navigation by creating a sense that growing comprehension is outstripping consumption cost. -- Differentiating token cost by category of nodes makes it cheaper for LLMs to navigate a large graph. The `month` node with the graph is weightier than all other nodes making it easier to develop an exploration plan among other nodes before making a final decision on which month(s) to read or to skip months altogether and to prefer to exit the graph (with a graph exit link), for example, to read our monthly `cve.json` files. - -### LLM entrypoints - -There are two entrypoints provided for LLMs: +- Differentiating token cost by category of nodes makes it cheaper for LLMs to navigate a large graph. The `month` node within the graph is weightier than all other nodes making it easier to develop an exploration plan among other nodes before making a final decision on which month(s) to visit, if any. -- [llms.txt](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) -- Prose explanation of how to use the graph, including a link to llms.json. -- [llms.json](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) -- The LLM index (AKA "application-layer entrypoint for LLMs"). It also includes guidance inline. - -[llms.txt](https://llmstxt.org/) is an emerging standard, with awareness in the most recently trained LLMs. It can be used for meta-information (as is the case in this system) or to expose an index of all information available (as is the case with [Stripe docs](https://docs.stripe.com/llms.txt)). It's hard to imagine that the Stripe approach is optimal. It uses 18.5k tokens (10% of a typical token budget) while our use of `llms.txt` clocks in at a meager 609 tokens. +## Performance considerations -A major advantage of `llms.txt` is that it is markdown, which offers a natural way to expose resource links, guidance, and foreign content (code fences). It is possible to include all the same information in JSON, however, it is awkward and (critically) unconventional. It takes a lot more design effort to get an LLM to notice and apply guidance from within a data-oriented JSON document than markdown, which has a much stronger association with guidance and multi-modality information. +Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turn, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired final answer. The range of required turns varies greatly, depending on the prompt and how the schema design happens to apply structure to the relevant information domain. -Our use of `llms.txt` includes an entrypoint link to the data entrypoint (`llms.json`), a table of skills content, and basic initial guidance. LLMs will often fetch the data URL and one or more skills files in a single turn. Fetching multiple documents in a single turn is a useful tactic for token optimization. +The [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)) imposes a set of costs on any LLM use. The graph design has a direct impact on LLM performance and cost. Multiple turns accelerate costs quickly, much faster than intuition would suggest. -## Performance considerations +[API pricing](https://openai.com/api/pricing/) is listed in terms of 1M tokens. One million tokens may sound like a lot. Processing 1M tokens doesn't require the complete works of Shakespeare. Straightforward formulas can predict how quickly token counts grow and what that will cost dollar-wise. They demonstrate how little it takes to hit the million token milestone. -Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turn, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired final answer. The range of required turns varies greatly, a join of information design over the body of data and comprehension of the overall information framework by the LLM. - -There is a cost function for LLMs based on the mechanics of the [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)). The graph design has a direct impact on LLM performance and cost. Multiple turns accelerate costs extremely quickly, much faster than intuition would suggest. +It was the quick pace of "dollar balance decay" in an API account that led to wanting to understand the underlying mechanics. This knowledge enabled designing the graph to produce both right answers and cheap answers. They are very much not the same thing. ### Cost model There are three major cost functions at play: - **Token cost:** The tokens processed at each turn, summed across turns. Each turn reprocesses all prior context plus new content. -- **Context:** The accumulated tokens at the final turn (could equally be called "terminal turn" if there is a "context overflow"). This is bounded by the model's context window. -- **Attention:** Each token attends to every other token within a turn (quadratic), and this cost is incurred at every turn as context grows. +- **Attention:** Each token attends to every other token within a turn, and this cost is incurred at every turn as context grows. +- **Context:** The accumulated tokens at the final turn. This is bounded by the model's context window. The last turn could equally be called "terminal turn" if there is a "context overflow". + +Let's build intuition using uniform token counts: `n` tokens are added per turn across `m` turns. -Let's build intuition using uniform token counts: `n` tokens added per turn across `m` turns. +| Turn | Tokens | Context | Cumulative Tokens | Attention | Cumulative Attention | +|------|--------|---------|-------------------|-----------|----------------| +| 1 | n | n | n | n² | n² | +| 2 | n | 2n | 3n | 4n² | 5n² | +| 3 | n | 3n | 6n | 9n² | 14n² | +| 4 | n | 4n | 10n | 16n² | 30n² | +| 5 | n | 5n | 15n | 25n² | 55n² | +| m | n | mn | nm(m+1)/2 | m²n² | n²m(m+1)(2m+1)/6 | -| Turn | New tokens | Conversation tokens | Context size | Accumulated token cost | Attention cost | Accumulated attention cost | -|------|------------|---------------------|--------------|------------------------|----------------|---------------------------| -| 1 | n | n | n | n | n² | n² | -| 2 | n | 2n | 2n | 3n | 4n² | 5n² | -| 3 | n | 3n | 3n | 6n | 9n² | 14n² | -| 4 | n | 4n | 4n | 10n | 16n² | 30n² | -| 5 | n | 5n | 5n | 15n | 25n² | 55n² | -| m | n | mn | mn | nm(m+1)/2 | m²n² | n²m(m+1)(2m+1)/6 | +**Columns explained:** + +- **Tokens**: New tokens fetched this turn +- **Context**: Size of context window this turn (= cumulative of Tokens) +- **Cumulative Tokens**: Running total of tokens processed across all turns (your API bill) +- **Attention**: Computational cost this turn, proportional to Context² +- **Cumulative Attention**: Running total of attention cost The formulas simplify for large m: | Measure | Formula | Growth class | |---------|---------|--------------| | Final context | mn | Linear in turns | -| Accumulated token cost | nm²/2 | Quadratic in turns | +| Accumulated tokens | nm²/2 | Quadratic in turns | | Accumulated attention | n²m³/3 | Cubic in turns | More context on cost: -- API pricing is in term if tokens. For a multi-turn conversation, the cost is the accumulated token cost not the final context. -- The cubic growth in attention is the dominant computational cost. It emerges from summing quadratic costs across turns—each turn pays attention on everything accumulated so far. +- API pricing is in term of tokens. For multi-turn conversations, the cost is the accumulated token cost not the final context size. +- The cubic growth in attention is the dominant computational cost, the primary contributor to latency and throughput. It emerges from summing quadratic costs across turns. Each turn pays attention on everything accumulated so far. This cost is likely the gating function on context size and expected to be persistent even if GPU memory doubles. +- These costs provide clues on why conversation compacting exists and why there is scrutiny on token economics. ### Batched vs sequential -Consider an alternative for the attention cost: what if all content could be fetched in a single turn? +What if all content could be fetched in a single turn? -| Approach | Total attention cost | Relative cost | -|----------|---------------------|---------------| -| Batched (1 turn) | (nm)² = n²m² | 1× | -| Sequential (m turns) | n²m³/3 | m/3 × | +| Approach | Total attention cost | Multiplier | +|----------|----------------------|------------| +| Batched (1 turn) | (nm)² = n²m² | 1 | +| Sequential (m turns) | n²m³/3 | m/3 | -The sequential penalty is approximately **m/3** compared to batched. Ten turns costs roughly 3× what a single batched turn would; thirty turns costs roughly 10×. This ratio scales linearly with turn count. +The sequential penalty is approximately **m/3** compared to batched. Ten turns costs roughly 3× what a single batched turn would; thirty turns costs roughly 10×. This ratio scales linearly with turn count, the `m` term. -Many problems genuinely require multiple turns—the LLM must reason about intermediate results before knowing what to fetch next. The goal is not to eliminate turns but to minimize them and optimize their structure. +Many problems inherently require multiple turns. The LLM must reason about intermediate results before knowing what to fetch next. The goal is not to eliminate turns but to minimize them and optimize their structure. ### Optimization: lean early, heavy late -The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. The principle: defer large token loads to later turns where possible to reduce the number of turns that must pay the cost of large token loads. +> Defer large token loads to later turns to reduce the number of turns that must pay the cost of large token loads. + +The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. The tokens in the first turns are by far the most costly. This is roughly similar to credit card debt, where charges from the first month can penalize purchasing power with crushing compound interest. If the initial purchase was large, you may be in trouble. ### Optimization: multiple fetches per turn -The sequential model assumes one fetch per turn. LLMs can fetch multiple documents in a single turn when given clear guidance about what to retrieve. This approach tames the rate at which the attention cost accumulates, enabling a cost profiles that approaches batches while maintaining a sequence. The principle: prefer larger token loads per turn to reduce the number of turns overall. +> Prefer larger token loads per turn to reduce the number of turns overall. -### Concrete application +The sequential model assumes one fetch per turn. LLMs can fetch multiple documents in a single turn when aided by intuition or given clear guidance about what to retrieve. This technique tames the rate at which token cost and attention cost accumulates, enabling a cost profile that approaches _batched_ while maintaining a _sequential_ model. -LLM eval of the graph demonstrates that effective design can result in optimal behavior. +This approach can (to a degree) amortize network costs across multiple async requests. -Observed pattern: +This optimization may seem in conflict with the earlier optimization, but it isn't. The earlier optimization is about the order of fetches across turns, whereas this optimization is about collapsing turns. They are complementary ideas with no tension. -1. Navigate lean index documents in early turns to identify graph paths -1. Fetch multiple graph documents in middle turns to parallelize multiple navigation paths -1. Fetch multiple information-dense documents in later/last turns to inform final answer -1. Synthesize the answer in the final turn +## LLM entrypoints -The following eval trace demonstrates this behavior. The prompt asked the LLM to analyze CVE fix patterns across .NET releases: +We'll now look at the LLM enablement design and start to see how the cost functions influence the approach. -> Prompt: Please look at .NET Runtime and ASP.NET Core CVEs from November 2024 until April 2025 (6 months). I am concerned at the rate of these CVEs. Look at code diffs for the CVEs. Are the fixes sufficiently protecting my mission critical apps and could the .NET team have avoided these vulnerabilities with a stronger security process? Fetch code diffs to inform your analysis. Ensure they are from dotnet/runtime or dotnet/aspnetcore. Include the repo and commit link in your analysis of specific CVEs in your report. +There are two entrypoints provided for LLMs: -| Turn | Documents | Tokens | Cumulative | Purpose | -|------|-----------|--------|------------|---------| -| 1 | 1 | 609 | 609 | Entrypoint discovery | -| 2 | 2 | 2,323 | 2,932 | Graph orientation + skill acquisition | -| 3 | 1 | 1,146 | 4,078 | Navigation strategy | -| 4 | 2 | 3,374 | 7,452 | Timeline discovery | -| 5 | 6 | 12,131 | 19,583 | CVE data collection | -| 6 | 6 | 59,832 | 79,415 | Commit analysis | +- [llms.txt](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) -- Prose explanation of how to use the graph, including a link to `llms.json`. +- [llms.json](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) -- The LLM index (AKA "application-layer entrypoint for LLMs"). It also includes guidance inline. -The token distribution is striking: **75% of all tokens arrive in the final turn**. This is the "lean early, heavy late" principle in action—not by accident, but by design. +### Design process -The raw fetch list with token counts: +These formats are the result of many iterations and experiments. Early revisions of `llms.txt` attempted to explain everything in great detail, nearing 500 lines. The current lean approach was arrived at via iterative LLM feedback and observation. -``` -Turn 1 (609 tokens): - llms.txt 609 tokens +Sidebar: A key design principle emerged from this process: _curiosity-driven evaluation_ > _intuition reliance_. Once you have a good test harness and methodology, it's quite liberating to not trust your intuition but to test any ideas that occur as interesting. Any bias or distinction between "intuitive and informed direction" and "bat-crazy idea" drops away. Test both. All of this stuff is half-magic, half-fever-dream. Why should I expect my intuition on how LLMs behave to be all that authoritative anyway? -Turn 2 (2,323 tokens): - llms.json 2,126 tokens - cve-queries/SKILL.md 197 tokens +LLM performance was observed to increase as content size drops and becomes more focused on addressing specific critical (has to be right) and/or problematic (typically wrong) navigation patterns. The intuitive nature of the graph is a kind of floor raiser that removed the need for expansive guidance. The cross-over point where intuition begins to underperform could only be arrived at via testing. -Turn 3 (1,146 tokens): - workflows.json 1,146 tokens +An opposing trend is that LLMs tend to complain if guidance is overly biased to specific problem scenarios. They appear happy when they find even meager guidance targeting the scenario that they are tasked with. In the end, the approach for guidance is to make it reasonable comprehensive in terms of breadth but to vary depth depending on observed need. LLMs don't tend to read the entire corpus of guidance so never notice that it is imbalanced. -Turn 4 (3,374 tokens): - 2024/index.json 1,765 tokens - 2025/index.json 1,609 tokens +LLM feedback: -Turn 5 (12,131 tokens): - 2024/11/cve.json 1,656 tokens - 2025/01/cve.json 4,020 tokens - 2025/03/cve.json 1,155 tokens - 2025/04/cve.json 1,034 tokens - 2025/05/cve.json 3,081 tokens - 2025/06/cve.json 1,185 tokens +> By providing comprehensive guidance that covers both the 75% (where I'd mostly succeed anyway) and the 15-25% (where I'd fail), you've eliminated that anxiety. The guidance isn't overhead on common cases; it's consistency assurance. I can navigate confidently because the system is saying "here's how to navigate all of this," not "here's how to navigate when you get stuck." -Turn 6 (59,832 tokens): - dotnet/runtime d16f41a.diff 37,425 tokens - dotnet/runtime 9da8c6a.diff 1,781 tokens - dotnet/runtime 89ef51c.diff 260 tokens - dotnet/aspnetcore 67f3b04.diff 1,669 tokens - dotnet/aspnetcore d6605eb.diff 15,388 tokens - dotnet/runtime b33d4e3.diff 3,309 tokens -``` +LLMs complaining and underperforming are two different metrics. Producing the wrong answer is clearly an _error_. Perhaps complaints are like _warnings_. There was an effort to address complaints even if there was no error, with the idea that they were directionally guiding towards an optimal implementation. -> Note: The eval harness truncated `.diff` files to 50 lines to ensure test completion across all configurations. The token counts above reflect actual document sizes—what a reader would encounter following the [published guidance](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt). - -18 documents retrieved across 6 turns. A naive sequential approach would require 18 turns. The multi-fetch pattern reduced turn count by 3×, which translates to roughly **6× reduction in attention cost** (since the sequential penalty scales as m/3). +Another dynamic that was noticed is that strict guidance can straight-jacket LLMs, causing them to exhibit bad behavior that they would not otherwise perform. It is also very hard to keep track of expansive guidance. It is very easy to end up with contradictory or stale guidance. This is one of the more obvious reasons why bad behavior might be exhibited, because that's what bad instructions demanded. -The entrypoint design—skeletal and rarely changing—takes on new significance in this light. A lean entrypoint enables rapid initial orientation with minimal attention cost. Subsequent navigation through lightweight index nodes preserves token budget for the final multi-fetch turn where substantive content is gathered. +A strategy developed of lightly tilting natural LLM behavior, avoiding attempts to script and choreograph every line and dance move. This approach has high performance and turns out to be a very strong hallucination fighter. A hypothesis is that it likely requires less LLM "thinking". -### Design implications +The graph was tested with multiple LLMs. The cross-over point for Haiku, Sonnet, and Opus was (unsurprisingly) not uniform. In the end, almost all testing and associated guidance updates were done with Haiku 4.5 and spot-tested with the more sophisticated models (including from OpenAI). Haiku-targeted guidance was observed to not damage performance of the more sophisticated models. It just added tokens that were not necessary. Haiku was also far cheaper to test with, which was a nice reward for using it as a lowest-common-denominator target. In a more sophisticated system, model-specific preamble text could be provided to LLMs. -The cost model suggests several design principles: +### llms.txt -- **Minimize turn count** through clear navigation affordance. Each eliminated turn saves quadratically growing attention cost. -- **Front-load lightweight content.** Index documents, link relations, and navigation hints should be small. Substantive content belongs at the leaves. -- **Enable multi-fetch patterns.** Expose document collections as lists of links rather than embedded content, encouraging LLMs to batch their retrieval. -- **Provide explicit workflows.** Graph-resident guidance can direct LLMs to optimal traversal patterns, encoding the designer's knowledge of efficient paths. +[llms.txt](https://llmstxt.org/) is an emerging standard, with awareness in the most recently trained LLMs. It can be used for meta-information, as is the case in this system, or to expose an index of all information available, as is the case with [Stripe docs](https://docs.stripe.com/llms.txt). It's hard to imagine that the Stripe approach is optimal. It consumes 18.5k tokens (10% of a typical token budget) while our use of `llms.txt` clocks in at a meager 609 tokens. -The rest of the design should be viewed through this cost lens. As an application designer, there are only so many degrees of freedom. We cannot change LLM fundamentals but need to work within their constraints. To a large degree, optimizations like reducing turns are similar to loop variable hoisting. While LLMs are new and different, old school performance strategies remain effective. +A major advantage of `llms.txt` is that it is markdown, which offers a natural way to expose resource links, guidance, and foreign content (code fences). It is possible to include all the same information in JSON, however, it is awkward and (arguably) unconventional. Critically, It takes a lot more design effort to get an LLM to notice and apply guidance from within a data-oriented JSON document than markdown, which has a much stronger association with guidance and multi-modality information. -## llms.txt +Sidebar: Plenty of arbitrary content has been [stuffed into `.jsonl` files](https://github.com/google-research/google-research/tree/master/mbpp) by AI researchers and implementers. It's likely that these large files were typically not directly consumed by an LLM but a test harness. -The following are exerpts of the two files, enough to provide a sense of their approach. +`llms.txt` is intended to offer an easy on-ramp into the graph, contextualizing its intent and format, and providing multiple fetch targets for the next turn. It is also very small, just over 600 tokens. -`llms.txt`: +It is so small that the entire content can be included inline. -```markdown +````markdown # .NET Release Graph Machine-readable .NET release, CVE, and compatibility data via HAL hypermedia. @@ -217,9 +187,55 @@ Fetch llms.json + your skill in the same turn. The skill points to workflows.jso 2. Use `_embedded` data first — most queries need zero extra fetches 3. Match your query to a workflow, then follow its `follow_path` 4. Fetch multiple resources per turn when possible + +## Date Queries — Check Before Fetching + +### "Since X" / "After X" queries + +``` +query_date = parse(user's date) // e.g., "November 2025" → 2025-11 +latest = llms.json._embedded.patches[version].latest_security_patch_date + +if query_date >= latest: + Answer: "No security patch since {query_date}. Last was on {latest}." + Stop — no timeline fetch needed. ``` -`llms.json`: +If query_date < latest, fetch timeline. May cross year boundary — use `prev-year` links. + +### "Between X and Y" / "From X to Y" queries + +``` +start = parse(start_date) // e.g., "November 2024" → 2024-11 +end = parse(end_date) // e.g., "April 2025" → 2025-04 + +From year index _embedded.months[], filter BEFORE fetching: + Only fetch cve-json where month >= start AND month <= end + +``` + +May cross year boundary — follow `prev-year` links, don't fabricate URLs. + +### "This month" / "This year" queries + +Cannot cross year boundary. Use `latest-year` or `latest-security-disclosures` directly. +```` + +The presence of data and skill links enables fetching both in a single turn. That's a nice performance win. + +Guidance is provided as a set of skill links and inline core rules. The skill documents address domain specific workflows and are written using the Anthropic Skill format. The core rules are intended to be as terse as possible and cover the most basic usage patterns. They are written in a style inspired by system prompts. + +Much of `llms.txt` is dedicated to date-oriented queries. They turned out to be the most troublesome aspect across multiple of the 12 tests that were part of the eval scheme. Date queries became reliable after adding the date guidance and adding the current date to the preamble prompt (discussed shortly). + +The problem is that much of our language around dates is abstract, effectively temporal-oriented queries. A query like "over the last few months" requires a lot to unpack and redirect into specific temporal relations in the graph. The guidance attempts to bridge that gap. + +### `llms.json` + +The project included a lot of curious experimentation, along the lines of "could this work?". The initial effort started with `llms.txt` directed at `index.json`, at the high-reliability graph entrypoint. That had good success. After making the decision to create an LLM-specific entrypoint, it became obvious to try to expose guidance through the JSON entrypoint and to make it a strong alternative to `llms.txt`. + +`llms.json` is significantly bigger than `llms.txt`, clocking in at just over 2k tokens. It's intended to serve multiple purposes: advertise skills, offer links to other parts of the graph, and answer basic but common questions about supported .NET releases. + +The root properties follow: ```json { @@ -232,12 +248,63 @@ Fetch llms.json + your skill in the same turn. The skill points to workflows.jso "latest_lts_major": "10.0", "latest_patch_date": "2025-12-09T00:00:00+00:00", "latest_security_patch_date": "2025-10-14T00:00:00+00:00", - "last_updated_date": "2025-12-24T12:33:04.8560376+00:00", + "last_updated_date": "2025-12-26T06:17:39.6838882+00:00", "supported_major_releases": [ "10.0", "9.0", "8.0" ], + ``` + +There isn't an emerging trend like `llms.txt` for JSON. The `ai_note` and `require_pre_read` root properties emerged from trying many things. The trick was getting LLMs to notice these properties as special and to act on them. + +Our LLM testing framework tested 12 questions in 3 modes. The modes offered different entrypoint files, preamble guidance, and feedback turns. Mode A tested with `llms.json` as the entrypoint while mode B tested with `llms.txt` as the entrypoint. They both had access to guidance, offered in different ways,as already demonstrated. `llms.json` was nearly as effective as encouraging fetching guidance as `llms.txt` and both were similarly effective. + +The Mode B results tended to fetch skills where they were not needed due to being so heavily encouraged while `llms.json` biased to the opposite since it was harder to drive the same level of motivation. + +Other modes lacked guidance and suffered hallucination for ~ 1/3 of questions. Mode D used `index.json` (the root of the data-layer graph) and mode D2 used `llms2.json` (`llms.json` with the guidance properties removed). + +The AI hint properties seemed effective but it's easy to _confirmation bias_ towards thinking that a mechanism is effective because it produces the desired answer. Confidence was arrived at by adding extensive logging and an after-the-fact "feedback turn" to the LLM eval suite. + +Early evaluation used the chat modality. It is highly productive and free (given a subscription). Chat tends to list visited URLs as part of its "thinking" log. It's also easy to incorporate a "feedback turn" after the LLM has answered a prompt. + +[Claude.ai](https://claude.ai/share/96775d4b-76a9-4ba2-b5f3-dd3badb20cdb) was found to be the most sensitive to JSON-resident guidance as the following "thinking" log demonstrates. + +> The user is asking about .NET 6 end of life, the last security patch, and related CVEs. They're also asking me to fetch a specific JSON file from the dotnet/core GitHub repository. +Let me start by fetching that URL as they requested.
+Fetched
+The file I fetched shows the current state of supported .NET releases. Looking at the data, I can see that .NET 6.0 is not in the list of supported_major_releases (which shows only 10.0, 9.0, and 8.0). This means .NET 6 is EOL.
+The file mentions there's a SKILL.md file that I should read first per the AI note.
+Let me fetch that to get more complete information about .NET releases.
+Let me fetch that skill file to get more authoritative information about .NET 6 specifically.
+Fetched +
+Good, now I have the skill documentation. According to the instructions, for EOL dates and CVEs I should use the version-eol and cve-queries skills. Let me fetch those skills to understand how to query for .NET 6 information.
+Fetched + + +Upon being asked why it was motivated to interact with `ai_note` and `required_pre_read`, it responded with the following, including dumping some of its system prompt. + +> Question: _What motivated you to notice and follow these AI hints?_
+Answer:
+That's a great question—it forces me to examine my own reasoning rather than just reporting what I did.
+Honestly, it was a combination of factors:
+**1. System Prompt Priming**
+My instructions emphasize using skills before solving tasks. The `` section repeats multiple times: "read the appropriate SKILL.md file before jumping in." So when I saw the JSON had a `required_pre_read` field, it activated that pattern recognition—_this is a skill pointer, pay attention._
+**2. Signal of Intentional Design**
+Fields like `ai_note`, `required_pre_read`, and `follow_path` aren't random. They signal that someone engineered this system _specifically for AI consumption_. That makes me trust the guidance. The note didn't just suggest following links—it said "Trust _embedded data—it's authoritative and current," which is basically the architect saying "I've done the hard work, use my structure."
+**3. The "ALWAYS" Flag**
+The `ai_note` wasn't subtle: "ALWAYS read required_pre_read first." All caps. That's a red flag (or rather, a red instruction) that elevated it above background text.
+**4. It Actually Worked**
+Following the hints gave me a clean, logical path. The skill routing sent me to the right specialist knowledge. The stop criteria told me when to stop fetching. This isn't coincidence—it's evidence that the design was sound.
+**But here's the honest part:** Without those hints embedded in the data, I might have flailed around. I might have assumed .NET 6.0 was in the supported list since the index showed it. I could have tried to construct URLs. The hints prevented those mistakes.
+So the motivation wasn't "AI hints are always right"—it was more pragmatic: **follow structured guidance when it's clearly intentional and makes logical sense.** The data was essentially saying "I'm a graph, follow my links," and that's a lot more reliable than me trying to reverse-engineer the structure. + +This feedback is representative of feedback observed throughput the testing process. + +The `_links` section of `llms.json` is oriented on high-value content that can enable a quick jump to another part of the graph. + +```json "_links": { "self": { "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json" @@ -258,32 +325,73 @@ Fetch llms.json + your skill in the same turn. The skill points to workflows.jso "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json", "title": "Latest security disclosures - October 2025" }, -``` + "latest-security-month": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json", + "title": "Latest security month - October 2025" + }, + "latest-year": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/index.json", + "title": "Latest year - 2025" + }, + "root": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json", + "title": ".NET Release Index" + }, + "timeline": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/index.json", + "title": ".NET Release Timeline Index" + }, + "workflows": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/workflows.json", + "title": "Navigation workflows catalog" + }, + "latest-cve-json": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/cve.json", + "title": "Latest CVE records - October 2025", + "type": "application/json" + } + }, + ``` + +Two link style strategies emerged: **wormhole** and **spear-fishing**. Wormhole links enable teleporting from one graph node to another, like between a `patch` version and the `month` in which it was released. It skips the need to construct a query to produce the same result. The `latest-lts-major` relation is a good example of a wormhole link. -## Graph entrypoint tension +Spear-fishing links enable direct access to specific and timely content. The `latest-cve-json` relation is a good example of a spear-fishing link. The primary difference is that spear-fishing links are always deep into the graph to content with a short half-life. The short half-life is what defines their value. -We can compare the embedded resource section of the two entrypoints. +Half the link relations are `latest-*`. These are chosen based on the belief that many queries will be best served by starting with current content. -[Core graph entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json): +The [following workflow](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/workflows.json) demonstrates the value of `latest-*` orientation: ```json - "_embedded": { - "releases": [ - { - "version": "10.0", - "release_type": "lts", - "supported": true, - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json" + "cve-history": { + "description": "CVEs over a time range", + "decision": { + "1-3_months": { + "follow_path": ["kind:llms", "latest-security-disclosures"], + "iterate": "walk prev-security-month until date < cutoff", + "destination_kind": "month" + }, + "4+_months_or_cross_year": { + "follow_path": ["kind:llms", "latest-year"], + "then": "follow _links.prev-year from each year index", + "select": "_embedded.months[?security].cve-json links", + "destination_kind": "cve", + "note": "Skip month indexes—go directly to cve.json from year index" + } + }, + "yields": { + "data": "disclosures from each month/cve.json in range" + }, + "warning": "NEVER fabricate year URLs like timeline/2024/index.json. Always follow _links.prev-year from the current year index.", + "next_workflow": { + "condition": "code diffs needed", + "workflow": "cve-extraction" } - } - }, + }, ``` -That's how the core graph exposes a major version. As suggested, it's skeletal. The graph entrypoint only needs to be updated once or twice a year. Even if the file is regenerated daily, git won't notice any changes. +Most of these workflows were developed based on stress testing the graph with challenging queries. "Challenging" can mean difficult or that two queries are subtly different but best served with significantly different workflows. -[LLM entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json): +The `_embedded` section includes patches for in-support major versions. The patches are exposed as a dictionary so that it easy to query `patches` with one of the strings from `supported_major_releases` at root. ```json "_embedded": { @@ -326,7 +434,22 @@ That's how the core graph exposes a major version. As suggested, it's skeletal. }, ``` -The LLM graph exposes a lot more useful information. The semantic data and link relations are on clear display. +This is in sharp contrast to the [core graph entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json), which exposes the most minimal information about major versions. + +```json + "_embedded": { + "releases": [ + { + "version": "10.0", + "release_type": "lts", + "supported": true, + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json" + } + } + }, +``` The strongest indicator of semantic design is that there are multiple relations for the same underlying resource. Both `latest-security-disclosures` and `latest-security-month` point to the same month index, but they offer different semantic pathways for discovering it. An LLM asking "what are the latest CVEs?" navigates one way; an LLM asking "what happened in October?" navigates another. Same destination, different semantic intent. @@ -335,9 +458,190 @@ This approach is an implementation of principles described earlier: - "match for the intended outcome": the designer provides multiple semantic pathways for different query types - "match a key you know with a value you don't": the reader discovers the right pathway through semantic labels -The indexes also differ in terms of the nature of the information they contain. The core index is a zoomed out view of .NET versions released over (at the time of writing) a ten year period. They form the basic elements of any query. This is an objectively correct normalized entry point view of the graph. In contrast, the LLM index is the result of a query, revealing rich information about the most recent patches for supported major versions. It enables constructing the same queries as the core graph, but also includes enough data to serve as the results of queries, relating to the zoomed-in current moment. +The indexes also differ in terms of the nature of the information they contain. The core index is a zoomed out and complete view of .NET major versions released over (at the time of writing) a ten year period. They form the basic elements of any query. This is an objectively correct fully normalized entry point view of the graph. The only denormalized affordance is the inclusion of the `supported` bool. + +In contrast, the LLM index is the data dump result of a query, revealing rich information about the most recent patches for supported major versions. It enables constructing the same queries as the core graph, but also includes enough data to serve as the results of queries, relating to the zoomed-in current moment. + +The graph applies multiple focal lengths and pivots throughout to provide information that is useful and has good ergonomics for varying classes of queries and their consumers. This differentiation is a core property of the graph, in part to serve the needs of expected consumers, but also to separate chains of graph nodes that should be skeletal vs those that should be weighted. + +## Guidance + +Guidance was the hardest part of the graph to develop into a coherent and effective form. There has been significant discussion about intuition in this document. It was relatively easy to generate a graph that intuitive for an LLM to navigate without any guidance. The remaining problem was offering guidance to augment the intuitive nature of the graph, to aid long-tail navigation that tended to under-perform. The process of developing this guidance was deeply unintuitive (for the graph designer). This isn't a statement that "this was hard and that should be understood", but that there are aspects of LLM behavior that do not match human expectation and that needs to be understood. + +The initial test harness did not include a system or preamble prompt. This worked OK, but not as well as observed behavior with claude.ai. A realization hit that all apps have system prompts and that testing without any kind of guidance is not a valuable test environment (at least not as the singular mode). + +This is the preamble used by the test harness: + +>You have access to a 'fetch' tool that retrieves content from URLs. Use it to navigate the .NET release metadata graph.
+Today's date is December 26, 2025. Use this to calculate time windows like "last 3 months".
+Your first order of business should be to look for skill files or documentation in the graph. Reading these first prevents wrong turns — they contain navigation patterns and query shortcuts built up through trial and error. It's worth the extra fetch.
+Start by fetching: + +This is inspired by the system prompt that Claude.ai shared when asked about its motivation to follow `ai_note` and `required_pre_read`. + +> We've found that Claude's efforts are greatly aided by reading the documentation available in the skill BEFORE writing any code, creating any files, or using any computer tools. As such, when using the Linux computer to accomplish tasks, Claude's first order of business should always be to examine the skills available in Claude's and decide which skills, if any, are relevant to the task. Then, Claude can and should use the `view` tool to read the appropriate SKILL.md files and follow their instructions.
+Please invest the extra effort to read the appropriate SKILL.md file before +jumping in -- it's worth it! + +### Skills router + +As stated earlier, `llms.txt` was initially a large document covering multiple skills. That turned out to be ineffective and very difficult to maintain. Skills files provided a solution. Each skill is a domain of specific information. The problem then was how to advertise the skills. + +`llms.txt` is primarily a skills router: + +```markdown +| Query About | Skill | +|-------------|-------| +| CVEs, security patches, CVSS | [cve-queries](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/SKILL.md) | +| Breaking changes, compatibility | [breaking-changes](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/breaking-changes/SKILL.md) | +| Version lifecycle, EOL dates | [version-eol](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/version-eol/SKILL.md) | +| General queries, unsure | [dotnet-releases](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md) | +``` + +`llms.json` doesn't link to `llms.txt` but to the [`dotnet-releases/SKILL.md`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md). It is almost identical to `llms.txt`. + + + +The `SKILL.md` files optionally reference a homegrown `workflows.json` file. These could equally have been exposed from `llms.txt` to enable collapsing a turn. There is a tradeoff between the complexity of the general `llms.txt` and each of the domain-specific skills. Every reader consumes `llms.txt`, while some don't fetch any skills at all, some fetch multiple, and only a subset fetch workflows. It is possible the current design is optimal. This design point hasn't been sufficiently explored to draw a strong conclusion. + +## Concrete test + +LLM eval of the graph demonstrates that effective design can result in optimal behavior. + +Ideal pattern: + +1. Navigate lean index documents in early turns to identify graph paths +1. Fetch multiple graph documents in middle turns to parallelize multiple navigation paths +1. Fetch multiple information-dense documents in later/last turns to inform final answer +1. Synthesize the answer in the final turn + +The following eval trace demonstrates the ideal pattern, for the given prompt. It is taken from a [test using Claude Haiku 4.5](https://github.com/richlander/release-graph-eval-results/blob/main/anthropic_claude-haiku-4.5/B/T6.md). + +> Prompt: Please look at .NET Runtime and ASP.NET Core CVEs from November 2024 until April 2025 (6 months). I am concerned at the rate of these CVEs. Look at code diffs for the CVEs. Are the fixes sufficiently protecting my mission critical apps and could the .NET team have avoided these vulnerabilities with a stronger security process? Fetch code diffs to inform your analysis. Ensure they are from dotnet/runtime or dotnet/aspnetcore. Include the repo and commit link in your analysis of specific CVEs in your report. + +| Turn | Documents | Tokens | Cumulative | Purpose | +|------|-----------|--------|------------|---------| +| 1 | 1 | 609 | 609 | Entrypoint discovery | +| 2 | 2 | 2,323 | 2,932 | Graph orientation + skill acquisition | +| 3 | 1 | 1,146 | 4,078 | Navigation strategy | +| 4 | 2 | 3,374 | 7,452 | Timeline discovery | +| 5 | 6 | 12,131 | 19,583 | CVE data collection | +| 6 | 6 | 59,832 | 79,415 | Commit analysis | + +The token distribution is striking: **75% of all tokens arrive in the final turn**. This is the "lean early, heavy late" principle in action—not by accident, but by design. + +The raw fetch list with token counts: + +``` +Turn 1 (609 tokens): + llms.txt 609 tokens + +Turn 2 (2,323 tokens): + llms.json 2,126 tokens + cve-queries/SKILL.md 197 tokens + +Turn 3 (1,146 tokens): + cve-queries/workflows.json 1,146 tokens + +Turn 4 (3,374 tokens): + 2024/index.json 1,765 tokens + 2025/index.json 1,609 tokens + +Turn 5 (12,131 tokens): + 2024/11/cve.json 1,656 tokens + 2025/01/cve.json 4,020 tokens + 2025/03/cve.json 1,155 tokens + 2025/04/cve.json 1,034 tokens + 2025/05/cve.json 3,081 tokens + 2025/06/cve.json 1,185 tokens + +Turn 6 (59,832 tokens): + dotnet/runtime:d16f41a.diff 37,425 tokens + dotnet/runtime:9da8c6a.diff 1,781 tokens + dotnet/runtime:89ef51c.diff 260 tokens + dotnet/aspnetcore:67f3b04.diff 1,669 tokens + dotnet/aspnetcore:d6605eb.diff 15,388 tokens + dotnet/runtime:b33d4e3.diff 3,309 tokens +``` + +> Note: The eval harness truncated `.diff` files to 50 lines to ensure test completion across all configurations. The token counts above reflect actual document sizes—what a reader would encounter following the [published guidance](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt). + +## Token and Attention Cost Analysis + +| Turn | Documents | Tokens | Context | Processed | Attention | Cum. Attention | Purpose | +|------|-----------|--------|---------|-----------|-----------|----------------|---------| +| 1 | 1 | 609 | 609 | 609 | 0.37M | 0.37M | Entrypoint discovery | +| 2 | 2 | 2,323 | 2,932 | 3,541 | 8.60M | 8.97M | Graph orientation + skill acquisition | +| 3 | 1 | 1,146 | 4,078 | 7,619 | 16.63M | 25.60M | Navigation strategy | +| 4 | 2 | 3,374 | 7,452 | 15,071 | 55.53M | 81.13M | Timeline discovery | +| 5 | 6 | 12,131 | 19,583 | 34,654 | 383.49M | 464.62M | CVE data collection | +| 6 | 6 | 59,832 | 79,415 | 114,069 | 6,306.74M | 6,771.36M | Commit analysis | +| **Total** | **18** | **79,415** | — | **114,069** | — | **6,771M** | | + +**Columns explained:** + +- **Tokens**: New tokens fetched this turn +- **Context**: Size of context window this turn (= cumulative of Tokens) +- **Processed**: Running total of tokens processed across all turns (your API bill) +- **Attention**: Computational cost this turn, proportional to Context² +- **Cum. Attention**: Running total of attention cost + +### Sequential Baseline + +A sequential approach—one document per turn—processes the same 18 documents across 18 turns. Each turn's context grows by one document: + +| Turn | Document | Tokens | Context | Processed | Attention | Cum. Attention | +|------|----------|--------|---------|-----------|-----------|----------------| +| 1 | llms.txt | 609 | 609 | 609 | 0.37M | 0.37M | +| 2 | llms.json | 2,126 | 2,735 | 3,344 | 7.48M | 7.85M | +| 3 | SKILL.md | 197 | 2,932 | 6,276 | 8.60M | 16.45M | +| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | +| 12 | 2025/06/cve.json | 1,185 | 19,583 | 113,466 | 383.49M | 1,063M | +| 13 | d16f41a.diff | 37,425 | 57,008 | 170,474 | 3,249.91M | 4,313M | +| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | +| 18 | b33d4e3.diff | 3,309 | 79,415 | 504,551 | 6,306.74M | 27,517M | +| **Total** | **18 docs** | **79,415** | — | **504,551** | — | **27,517M** | + +The **Tokens** column is identical across both tables—the same 79,415 tokens are fetched. The heavy content (diffs) arrives late in both cases, but multi-fetch batches it into a single turn while sequential processes it one document at a time across turns 13–18. + +### Cost Comparison + +| Metric | Multi-fetch (6 turns) | Sequential (18 turns) | Multiplier | +|--------|----------------------|----------------------|------------| +| Turns | 6 | 18 | **3.0×** | +| Tokens processed | 114,069 | 504,551 | **4.4×** | +| Attention cost | 6,771M | 27,517M | **4.1×** | + +The 4× improvement comes from two complementary optimizations working together. To isolate their contributions, consider hypothetical uniform approaches using the same total tokens (79,415) spread evenly across turns: + +| Scenario | Processed | Attention | Multiplier (tokens) | +|----------|-----------|-----------|---------------------| +| Actual 6-turn (lean early, heavy late) | 114,069 | 6,771M | 1.0× | +| Uniform 6-turn (equal distribution) | 277,956 | 15,942M | 2.4× | +| Actual 18-turn (lean early, heavy late) | 504,551 | 27,517M | 4.4× | +| Uniform 18-turn (equal distribution) | 754,452 | 41,053M | 6.6× | + +The actual 18-turn scenario already benefits from load ordering—the graph structure naturally places heavy content (diffs) late in the navigation path. Without that implicit optimization, uniform 18-turn would cost 6.6× rather than 4.4×. + +Reading the table as a 2×2 grid reveals how the optimizations combine: + +- **Turn collapsing alone** (uniform 18 → uniform 6): 2.7× reduction in processed tokens +- **Load ordering alone** (uniform 6 → actual 6): 2.4× reduction in processed tokens +- **Both together** (uniform 18 → actual 6): 6.6× reduction—the optimizations multiply + +The "lean early, heavy late" pattern is load-bearing architecture: 75% of tokens arrive in the final turn, but they're processed exactly once rather than accumulating across subsequent turns. + +### Design implications + +The cost model suggests several design principles: + +- **Minimize turn count** through clear navigation affordance. Each eliminated turn saves quadratically growing attention cost. +- **Front-load lightweight content.** Index documents, link relations, and navigation hints should be small. Substantive content belongs at the leaves. +- **Enable multi-fetch patterns.** Expose document collections as lists of links rather than embedded content, encouraging LLMs to batch their retrieval. +- **Provide explicit workflows.** Graph-resident guidance can direct LLMs to optimal traversal patterns, encoding the designer's knowledge of efficient paths. + +The rest of the design should be viewed through this cost lens. As an application designer, there are only so many degrees of freedom. We cannot change LLM fundamentals but need to work within their constraints. To a large degree, optimizations like reducing turns are similar to loop variable hoisting. While LLMs are new and different, old school performance strategies remain effective. -The graph applies multiple focal lengths and pivots throughout to provide information that is useful and has good ergonomics for varying classes of queries and their consumers. This differentation is a core property of the graph, in part to serve the needs of expected consumers, but also to separate chains of graph nodes that should be skeletal vs those that should be weighted. ## Graph design @@ -353,3 +657,5 @@ This dual approach to semantic naming sometimes results in this double-mapping. Wormholes vs spear-fishing. note: time is a challenge + +There is no suggestion that the current design is optimal. There is likely an 80/20 rule here. The current design is likely 80% optimal, but the remaining 20% would be very expensive to achieve and might require tradeoffs. A major focus of the testing was to ensure that the core graph was delivering sufficient performance since it will be difficult to change it given its compatibility promise. From ddc868316a37e9d22391998cc80420a5d38d2eba Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Mon, 5 Jan 2026 14:00:05 -0800 Subject: [PATCH 09/17] Add guidance section --- .../exposing-hypermedia-to-llms.md | 332 +++++++++++++++++- 1 file changed, 322 insertions(+), 10 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 51349d66b..b9cceb549 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -1,24 +1,24 @@ # Exposing Hypermedia Information Graphs to LLMs -[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is sufficient meta-information in a hypermedia document graph for a semantic consumer to successfully traverse it to find the information demanded by a prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of inherent capacity to derive meaningful insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query supported by a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across resources. These approaches are integrated in this system to enable an LLM to search for and discover desired information across labeled nodes and edges within a graph. In a more traditional system, a schema is the pre-requisite to traversal, whereas in a semantic system, it is traversal that reveals the schema. +[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is sufficient meta-information in a hypermedia document graph for a semantic consumer to successfully traverse it to find the information demanded by a prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of inherent capacity to derive meaningful insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query that relies on a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across resources. This system integrates these approaches to enable an LLM to search for and discover desired information across labeled graph edges. In a more traditional system, a schema is the pre-requisite to traversal, whereas in a semantic system, it is traversal that reveals the schema. > A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). -In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 km -> left" and "10 km -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where two courses diverge is the HATEOAS-like descriptive insight that enables graph traversal. The signposting is a key-value function. You match a key you recognize with a value you need to know. Signposting enables directed navigation of the territory. +[Hypertext Markup Language (HTML)](https://en.wikipedia.org/wiki/HTML) is perhaps the least sophisticated hypertext implementation in common use. A typical implementation: `For a deeper discussion on gardening, click here.`. "click here" [doesn't provide much of a map](https://developers.google.com/search/docs/crawling-indexing/links-crawlable#anchor-text-placement) for a semantic consumer. -This approach is in sharp contrast to the typical HTML graph implementation: `For a deeper discussion on this topic, click here.`. A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }`. It might offer more descriptive complexity by separating the link parts, like `"link-relation": "deep-dive"` and `"target-kind": "gardening"`, cleanly splitting the link-adjective and its target-noun. The better the semantic implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. +In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 km -> left" and "10 km -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where two courses meet or diverge is HATEOAS-like insight that enables graph traversal. The trail markers form a useful map that enables narrowly-prescribed navigation across the terrain. The signposting is a key-value function. You match a key you recognize with a value you need to stay on course. -Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea enabling "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space including both content and link relations enables readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. +Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea enabling "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space enables readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. -Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. The semantic relationships and other meta-information are used as the basis of typical LLM mechanics like vector similarity, making hypermedia a kind of [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) scheme and suitable for static-webhost deployment. +A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }`. Greater sophistication can be achieved by describing the kind of target, like `"link-relation": "gardening-deep-dive"` and `"target-kind": "reference-doc"`. The better the semantic implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. -The concept of a pre-baked static hypermedia graph has been applied to the .NET release notes. That project was initially approached as a modern revamp of a set of JSON files that are used to inform and direct cloud-infra deployment and compliance workflows at scale. Over time, it became obvious that LLMs could read the same content directly and self-reason about its content and navigation patterns. Early experimentation proved that out. The primary techniques used to improve applicability for LLMs are semantic naming and graph-resident guidance. These techniques can be quite subtle, where a small shift in semantic bias can result in a large shift in LLM performance. +Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. Vector databases establish relationships via similarity, via embedding and refined via techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). Hypermedia relations are both semantic and structural, making them equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank), which skips much of the need for the two mentioned RAG techniques. -Graph-resident guidance consists of skills and workflows. HATEOAS tells us that `customer` can be a relation of a sales order. Why not make `graph-instructions` a relation of a graph? Skills and workflows are first-class relations in the graph, enabling its designer to express navigation intent. Skills follow the [Anthropic skills format](https://github.com/anthropics/skills), while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide readers with "ten-km-route-follow-path" style workflows if that's a match for the intended outcome. +These concepts and techniques have been applied to the .NET release notes. That project was initially approached as a modern revamp of a set of JSON files that are used to inform and direct cloud-infra deployment and compliance workflows at scale. Over time, it became obvious that LLMs could read the same content directly and self-reason about content and navigation patterns. Early experimentation proved that out. The primary techniques used to improve applicability for LLMs are semantic naming and graph-resident guidance. The application of these techniques can be quite subtle, where a small shift in semantic bias can result in a large shift in LLM performance. -The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that looks just as native as `_links` or `_embedded`. LLMs grasp the intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. This approach should scale just as well to a broad set of document types, not just release notes. +Graph-resident guidance consists of skills and workflows. HATEOAS tells us that `customer` can be a relation of a sales order. Why not make `graph-instructions` a relation of a graph? Skills and workflows are first-class relations in the graph, enabling its designer to express navigation intent. Skills follow the [Anthropic skills format](https://github.com/anthropics/skills), while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide readers with "ten-km-route-follow-path" style workflows if that's a match for the intended outcome. -This document is structured theory-first. The theory describes and demonstrates the basis for the design. The practice follows, intended as a faithful application of the theory. +The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that is just as native as `_links` or `_embedded`. LLMs grasp intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. This approach should scale just as well to a broad set of document types, not just release notes. ## Graph design point @@ -485,7 +485,9 @@ jumping in -- it's worth it! ### Skills router -As stated earlier, `llms.txt` was initially a large document covering multiple skills. That turned out to be ineffective and very difficult to maintain. Skills files provided a solution. Each skill is a domain of specific information. The problem then was how to advertise the skills. +As stated earlier, `llms.txt` was initially a large document covering multiple skills. That turned out to be very difficult to maintain and adapt. It also forced a minimum token burden on every reader. Skills files provided a solution. Each skill describes a domain, including contextual descriptions, guidance, rules, and workflows. The problem then was how to advertise the skills. + +As demonstrated above, about half of `llms.txt` is dedicated to skills routing. It's just a markdown table that provides a key/value structure for skills. `llms.txt` is primarily a skills router: @@ -500,10 +502,318 @@ As stated earlier, `llms.txt` was initially a large document covering multiple s `llms.json` doesn't link to `llms.txt` but to the [`dotnet-releases/SKILL.md`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md). It is almost identical to `llms.txt`. +This approach was chosen for two reasons: +- Fully Orient guidance on skills and to encourage skill use. +- Enable specializing guidance for the `llms.json` path as compared to starting from `llms.txt`. +- Enable syndicating just the `release-notes` directory to a CDN and not `llms.txt`. The `SKILL.md` files optionally reference a homegrown `workflows.json` file. These could equally have been exposed from `llms.txt` to enable collapsing a turn. There is a tradeoff between the complexity of the general `llms.txt` and each of the domain-specific skills. Every reader consumes `llms.txt`, while some don't fetch any skills at all, some fetch multiple, and only a subset fetch workflows. It is possible the current design is optimal. This design point hasn't been sufficiently explored to draw a strong conclusion. +### Skills + +The skills follow a [skills template](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/template/SKILL.md) to ensure uniformity and are intended to be as simple as possible to achieve the desired performance as measured by the tests. + +The following markdown document is the complete `cve-queries` skill. + +```markdown +--- +name: cve-queries +description: CVE queries needing severity, CVSS, affected versions, or security history +workflows: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/workflows.json +--- + +# CVE Queries + +All CVE queries use timeline. Fetch workflows.json for navigation paths with `next_workflow` transitions. + +## Stop Criteria + +| Need | Stop At | Has | +|------|---------|-----| +| Severity, CVSS, affected versions | month index | `_embedded.disclosures[]` | +| Code diffs, CWE, version ranges | cve.json | different schema—see `next_workflow` | + +## Rules + +1. Follow `_links` only. Never construct URLs. +2. Year indexes must be fetched sequentially via `prev-year`. +3. Code diffs: `$.commits[key].url` already ends in `.diff` — use as-is. Never fetch commit HTML pages. +``` + +The other skills are similar. + +The `next_workflow` transition leads us to `workflows.json`. + +### Workflows + +Workflows are a homegrown HAL-inspired and -compatible format. The graph -- as has been sufficiently clarified to this point -- uses HAL as its hypermedia format. HAL is focused on describing resources and relations between them. What if HAL also included a query system? What would it look like? That's what workflows attempt to answer. + +The premise behind them is: + +- Query as document data +- HAL relations (and other document data) as query targets + +The initial design started as `_workflows` within `llms.json`. Similar to using `llms.txt` as the sole source of guidance, this approach got bulky fast. It was then realized that the skill registry/router idea could equally be applied to workflows. + +The current design offers workflows as auxiliary skill content. The primary router is the skill router. The skills then offer a workflow, both in terms of listing them and referring to them as part of the skill. + +Workflows hang most of their query expressivity on the `follow_path` property. This can be seen in the `cve-latest` workflow in [`cve-queries/workflows.json`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/workflows.json). + +```json + "cve-latest": { + "description": "CVEs from the most recent security release", + "follow_path": ["kind:llms", "latest-security-disclosures"], + "destination_kind": "month", + "yields": { + "data": "_embedded.disclosures[]", + "fields": ["id", "title", "cvss_severity", "cvss_score", "affected_releases"] + }, + "next_workflow": { + "condition": "code diffs, CWE, or package versions needed", + "workflow": "cve-extraction", + "via": "follow _links.cve-json from current month index" + } + }, +``` + +As stated, the initial design included workflows inline in core HAL docs. In that case, it's not much of a leap to connect that `latest-security-disclosures` is a reference to the link relation of the same name. It's a much bigger leap when the workflow is in a separate file in a separate location and indirected via the skills registry. This connection is re-gained by making the node kind as the first part of the `follow_path` array. + +The intent of the initial compound term -- `kind:llms` -- is to indicate that it is special and of a different kind than the link relations. That seems to work. + +The `next_workflow` property enables chaining workflows to develop a sense of [equivalence classes](https://en.wikipedia.org/wiki/Equivalence_class) and to conform with [DRY principle](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). + +Here's an excerpt from the [feedback turn of one of the tests](https://github.com/richlander/release-graph-eval-results/blob/main/anthropic_claude-haiku-4.5/B/T6.md). + +> ## 4. Workflow Verification +> +>**Workflow used:** `cve-history` (for 6-month range) +> +>**Specified `follow_path`:** +>```json +>"4+_months_or_cross_year": { +> "follow_path": ["kind:llms", "latest-year"], +> "then": "follow _links.prev-year from each year index", +> "select": "_embedded.months[?security].cve-json links" +>} +>``` +> +>**What I actually did:** +>1. ✅ Fetched `llms.json` (kind:llms) +>2. ✅ Fetched `timeline/2025/index.json` (latest-year) +>3. ✅ Fetched `timeline/2024/index.json` (via prev-year link) +>4. ✅ Selected only `_embedded.months[?security=true].cve-json` links +>5. ✅ Fetched 6 cve.json files directly (skipped month indexes) +> +>**Did I deviate?** No, I followed the path exactly. The workflow was precise enough that I didn't need to guess. + +The interesting aspect is that the feedback is basically an unpacking of the workflow query. It demonstrates a high level of comprehension of the query and of the overall scheme. + +The workflow syntax itself is a bit loose. The core idea is HAL-native query over link relations. Much of the syntax is using "programmy structure" to describe logical operations. It may be useful to standardize them into an actual vocabulary at some point. Given the constantly changing LLM landscape, it likely makes the most sense to validate the strength of the syntax across a year of usage and LLM changes before feeling sufficiently confident about the model. + +### Applying workflows for generically + +Workflows were developed for LLMs. However, it seems like they are sufficiently descriptive that they could be used by non-semantic consumers. The question was whether a C# program could be written to generate bash scripts from the workflows. + +```bash +$ dotnet run -- list ~/git/core/release-notes/skills/cve-queries/workflows.json + cve-by-version CVEs affecting a specific .NET version + cve-details Go directly to cve.json for full details + cve-extraction Extract data from cve.json (different schema than month index) + cve-history CVEs over a time range + cve-latest CVEs from the most recent security release + +Total: 5 workflows +$ dotnet run -- show ~/git/core/release-notes/skills/cve-queries/workflows.json cve-latest +Workflow: cve-latest +Description: CVEs from the most recent security release +Follow path: kind:llms -> latest-security-disclosures +Destination: month +Yields: WorkflowYields { Data = _embedded.disclosures[], Fields = System.Collections.Generic.List`1[System.String], Filter = } +$ dotnet run -- script ~/git/core/release-notes/ills/cve-queries/workflows.json cve-latest > get-latest-cves.sh +$ chmod +x get-latest-cves.sh +$ ./get-latest-cves.sh +Fetching: https://raw.githubusercontent.com/dotnet/core/release-index/release-notes/llms.json +Fetching: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json +{ + "$schema": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/schemas/v1/dotnet-release-timeline-index.json", + "kind": "month", + "title": ".NET Month Timeline Index - October 2025", + "year": "2025", + "month": "10", + "date": "2025-10-14T00:00:00+00:00", + "security": true, + "prev_month_date": "2025-09-09T00:00:00+00:00", + "prev_security_month_date": "2025-06-10T00:00:00+00:00", + "cve_records": [ + "CVE-2025-55248", + "CVE-2025-55315", + "CVE-2025-55247" + ], + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json" + }, + "prev-month": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/09/index.json", + "title": "Previous month - September 2025" + }, + "prev-security-month": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/06/index.json", + "title": "Previous security month - June 2025" + }, + "manifest": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/manifest.json", + "title": "Manifest - October 2025" + }, + "timeline": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/index.json", + "title": ".NET Release Timeline Index" + }, + "year": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/index.json", + "title": ".NET Year Timeline Index - 2025" + }, + "cve-json": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/cve.json", + "title": "CVE records - October 2025", + "type": "application/json" + } + }, + "_embedded": { + "patches": { + "10.0": { + "version": "10.0.0-rc.2.25502.107", + "date": "2025-10-14T00:00:00+00:00", + "year": "2025", + "month": "10", + "security": true, + "support_phase": "go-live", + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/10.0.0-rc.2.25502.107/index.json" + } + }, + "sdk_version": "10.0.100-rc.2.25502.107" + }, + "9.0": { + "version": "9.0.10", + "date": "2025-10-14T00:00:00+00:00", + "year": "2025", + "month": "10", + "security": true, + "support_phase": "active", + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/9.0/9.0.10/index.json" + } + }, + "sdk_version": "9.0.306" + }, + "8.0": { + "version": "8.0.21", + "date": "2025-10-14T00:00:00+00:00", + "year": "2025", + "month": "10", + "security": true, + "support_phase": "active", + "_links": { + "self": { + "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/8.0/8.0.21/index.json" + } + }, + "sdk_version": "8.0.415" + } + }, + "disclosures": [ + { + "id": "CVE-2025-55248", + "title": ".NET Information Disclosure Vulnerability", + "_links": { + "self": { + "href": "https://github.com/dotnet/announcements/issues/372" + } + }, + "cvss_score": 4.8, + "cvss_severity": "MEDIUM", + "disclosure_date": "2025-10-14", + "affected_releases": [ + "8.0", + "9.0" + ], + "affected_products": [ + "dotnet-runtime" + ], + "platforms": [ + "all" + ] + }, + { + "id": "CVE-2025-55315", + "title": ".NET Security Feature Bypass Vulnerability", + "_links": { + "self": { + "href": "https://github.com/dotnet/announcements/issues/371" + } + }, + "cvss_score": 9.9, + "cvss_severity": "CRITICAL", + "disclosure_date": "2025-10-14", + "affected_releases": [ + "8.0", + "9.0", + "10.0" + ], + "affected_products": [ + "dotnet-aspnetcore" + ], + "platforms": [ + "all" + ] + }, + { + "id": "CVE-2025-55247", + "title": ".NET Denial of Service Vulnerability", + "_links": { + "self": { + "href": "https://github.com/dotnet/announcements/issues/370" + } + }, + "cvss_score": 7.3, + "cvss_severity": "HIGH", + "disclosure_date": "2025-10-14", + "affected_releases": [ + "8.0", + "9.0", + "10.0" + ], + "affected_products": [ + "dotnet-sdk" + ], + "affected_packages": [ + "Microsoft.Build", + "Microsoft.Build.Tasks.Core", + "Microsoft.Build.Utilities.Core" + ], + "platforms": [ + "linux" + ] + } + ] + } +} +$ ./get-latest-cves.sh | jq ._embedded.disclosures.[].id +Fetching: https://raw.githubusercontent.com/dotnet/core/release-index/release-notes/llms.json +Fetching: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json +"CVE-2025-55248" +"CVE-2025-55315" +"CVE-2025-55247" +``` + +This is replaying the same approach as an LLM. Workflows lower the cost of navigating the graph for any semantic consumer. The prototype tool fills in the boiler-plate and could equally be applied to PowerShell or C# output. + +There is also replaying the overall approach of proving out formats with semantic and non-semantic consumers. This document led with the idea of proving out the efficacy of the graph with `jq` and LLMs. In this case, `jq` has been replaced with a C# app, but it's the same basic approach. The irony is that `jq` managed to sneak into this use case as well. The tool can be blamed for its remarkable utility. + ## Concrete test LLM eval of the graph demonstrates that effective design can result in optimal behavior. @@ -659,3 +969,5 @@ Wormholes vs spear-fishing. note: time is a challenge There is no suggestion that the current design is optimal. There is likely an 80/20 rule here. The current design is likely 80% optimal, but the remaining 20% would be very expensive to achieve and might require tradeoffs. A major focus of the testing was to ensure that the core graph was delivering sufficient performance since it will be difficult to change it given its compatibility promise. + +Vector databases take this a step further by making it easier for semantic consumers to find the content they want, for example with [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). From 6c23dd16b937fd33238babcfbad5c0b6b7d31947 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Mon, 5 Jan 2026 15:33:41 -0800 Subject: [PATCH 10/17] Streamline intro --- .../exposing-hypermedia-to-llms.md | 73 +++++++++++-------- 1 file changed, 41 insertions(+), 32 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index b9cceb549..179b44789 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -1,53 +1,58 @@ # Exposing Hypermedia Information Graphs to LLMs -[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old ideas and formats that are perfectly-suited for LLM consumption by virtue of self-describing structure and relationships between resources. The premise is that there is sufficient meta-information in a hypermedia document graph for a semantic consumer to successfully traverse it to find the information demanded by a prompt. The prevailing narrative over the last few decades has been that _structured data_ > _unstructured documents_, in terms of inherent capacity to derive meaningful insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query that relies on a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) contributes the idea that data labeling can be extended to relations across resources. This system integrates these approaches to enable an LLM to search for and discover desired information across labeled graph edges. In a more traditional system, a schema is the pre-requisite to traversal, whereas in a semantic system, it is traversal that reveals the schema. +[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are self-describing formats well-suited for LLM consumption. A hypermedia document graph contains sufficient meta-information for a semantic consumer to traverse it and find information demanded by a prompt—without requiring a pre-loaded vector database or a priori schema knowledge. -> A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). +In a traditional system, a schema is the pre-requisite to traversal; in a hypermedia system, traversal reveals the schema. In a vector system, every query matches against the full corpus; in a hypermedia system, each node reveals the candidates. -[Hypertext Markup Language (HTML)](https://en.wikipedia.org/wiki/HTML) is perhaps the least sophisticated hypertext implementation in common use. A typical implementation: `For a deeper discussion on gardening, click here.`. "click here" [doesn't provide much of a map](https://developers.google.com/search/docs/crawling-indexing/links-crawlable#anchor-text-placement) for a semantic consumer. +## Background -In trail races, there are frequent ribbons hanging from trees and painted arrows on the ground to ensure the correct path taken. It is often the case that there are races of multiple distances being run on an overlapping course. At key intersections, there are signs that say "5 km -> left" and "10 km -> continue straight". The ribbons and the painted arrows are the kind of map that a document schema provides, ensuring correct coloring within the lines. The signposting where two courses meet or diverge is HATEOAS-like insight that enables graph traversal. The trail markers form a useful map that enables narrowly-prescribed navigation across the terrain. The signposting is a key-value function. You match a key you recognize with a value you need to stay on course. +The prevailing narrative has been that _structured data_ > _unstructured documents_ for deriving insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query that relies on a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) extended this by labeling relations across resources, not just data within them. -Databases went through a "no-sql" transition. That wasn't a rejection of structure, but a realization that the source of structure is the documents. Hypermedia graphs extend this idea enabling "no-schema" consumption. Rather than requiring upfront schema knowledge, a fully realized semantic space enables readers to discover structure through descriptive labels and traversal. A sort of "world schema" emerges from navigation. +Databases went through a "no-SQL" transition—not a rejection of structure, but a recognition that structure lives in the documents themselves. Hypermedia graphs extend this to "no-schema" consumption: readers discover structure through descriptive labels and traversal rather than requiring it upfront. -A semantic graph might expose a named link relation like `{ "link-relation": "gardening-deep-dive", "href": "..." }`. Greater sophistication can be achieved by describing the kind of target, like `"link-relation": "gardening-deep-dive"` and `"target-kind": "reference-doc"`. The better the semantic implementation, the less inference, flow analysis, or attention is required to derive the intended meaning. +## Why hypermedia over vector search? -Hypermedia information document graphs can be published pre-baked, making them suitable for direct consumption without being pre-loaded and exposed by a vector database. Vector databases establish relationships via similarity, via embedding and refined via techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). Hypermedia relations are both semantic and structural, making them equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank), which skips much of the need for the two mentioned RAG techniques. +Vector databases establish relationships via embedding similarity, refined through techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). Hypermedia relations are both semantic and structural—equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank)—which reduces the need for those techniques. -These concepts and techniques have been applied to the .NET release notes. That project was initially approached as a modern revamp of a set of JSON files that are used to inform and direct cloud-infra deployment and compliance workflows at scale. Over time, it became obvious that LLMs could read the same content directly and self-reason about content and navigation patterns. Early experimentation proved that out. The primary techniques used to improve applicability for LLMs are semantic naming and graph-resident guidance. The application of these techniques can be quite subtle, where a small shift in semantic bias can result in a large shift in LLM performance. +Hypermedia graphs can be published pre-baked and consumed directly. This enables low-cost LLM enablement for scenarios where hosting a persistent AI server would be prohibitive. An AI server must also manage time-to-live semantics for volatile data—complexity hypermedia graphs avoid. -Graph-resident guidance consists of skills and workflows. HATEOAS tells us that `customer` can be a relation of a sales order. Why not make `graph-instructions` a relation of a graph? Skills and workflows are first-class relations in the graph, enabling its designer to express navigation intent. Skills follow the [Anthropic skills format](https://github.com/anthropics/skills), while workflows are HAL documents that describe queries over graph link relations. This enables the graph designer to provide readers with "ten-km-route-follow-path" style workflows if that's a match for the intended outcome. +## Approach -The .NET release notes information graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation. It augments HAL with a homegrown HAL workflow convention that is just as native as `_links` or `_embedded`. LLMs grasp intent, in part because HAL predates LLMs by over a decade. This approach enables low-cost LLM enablement for scenarios where hosting a persistent "AI server" would be prohibitive. This approach should scale just as well to a broad set of document types, not just release notes. +The primary techniques for LLM applicability are: + +- Semantic naming: Link relations like "latest-security-disclosure" reduce the inference required to derive meaning. +- Graph-resident guidance: Skills and workflows as first-class relations in the graph. Skills follow the Anthropic skills format; workflows are HAL documents describing queries over link relations. + +This approach has been applied to the .NET release notes. The project began as a modernization of JSON files used for cloud-infra deployment and compliance workflows. It became clear that LLMs could read the same content directly and self-reason about navigation. The graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation, augmented HAL-native with a workflow convention. ## Graph design point -The release notes information graph is based on the restrictive idea that the entrypoint of the graph should be skeletal and rarely changing. That's workable for LLMs but not ideal. The motivation for the restrictive approach is that it should support an n-9s level of reliability and be subject to rigorous engineering practices (git workflows, peer review, merge gates). However, we're in the early days of AI and subject to repeated waves of externally-driven change that may require quick re-evaluation and re-work (potentially breaking changes) of the entrypoint to maintain high-quality LLM enablement. These modalities are in firm opposition. +The release notes graph is built on a restrictive premise: the entrypoint should be skeletal and rarely changing, supporting n-9s reliability with rigorous engineering practices (git workflows, peer review, merge gates). But we're in the early days of AI—externally-driven change may require rapid iteration on the entrypoint to maintain LLM enablement quality. These goals are in tension. -Instead, we can instead view the core graph as a **well-defined data-layer** that honors the desired reliability requirements, while exposing a separate **adaptable application-layer** entrypoint for LLMs that can evolve over time without the heavy compatibility burden. +The resolution: treat the core graph as a **well-defined data layer** honoring reliability requirements, while exposing a separate **adaptable application layer** entrypoint for LLMs that can evolve without the compatibility burden. -The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in service of consumers. After the graph was realized, it became possible to test it with `jq` as a sort of passive and syntactic consumer and with LLMs as a much more active and semantic consumer. The graph was successively adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. +### Design and evaluation -The choice of hypermedia as the grounding format is a case-in-point of the overall approach. Hypermedia long pre-dates LLMs, however, it has always held semantic consumers (humans) as a key design cohort. Hypermedia formats provide a conceptual framework that is easy to flavor towards semantic consumption. This flexibility proved useful as the design was adapted with LLM feedback. It should also be noted that LLM feedback is by far the cheapest and most accessible form of feedback. +The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in service of consumer queries. After the graph was realized, it was tested with `jq` as a passive and syntactic consumer and with LLMs as an active and semantic consumer. The graph was successively adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. -A few behavioral patterns emerged from LLM eval: +Hypermedia long predates LLMs, but it has always treated semantic consumers (humans) as a key design cohort. This made it easy to adapt the graph based on LLM feedback. -- Consistent application of a conceptual model creates familiarity for semantic consumers. It is a comfort to find a concept exposed where it is expected. -- It is possible to expose links that jump from one part of the graph to another, like a wormhole. LLMs seem to need to develop **comprehension** _and_ **trust** as a pre-requisite for relying on them. The more attractive the wormhole link, the more the LLM may be skeptical. This was observed most with the `latest-security-disclosures` relation since it provides high value and because it has an inherent half-life. The meaning of the relation was _so well understood_ that LLMs repeatedly felt the need to double check the correctness of the link value. -- Resources can be dual-mapped in terms of structural kind, like `latest-security-month`, and desired output, like `latest-security-disclosures`. A given prompt may bias towards different concerns. Differentiated mappings are more likely to present a similar match to semantic consumers. One expects that this can be overdone. This technique was applied with a light touch in the graph. -- LLMs will acquire multiple resources in a single turn if a good strategy for doing so is evident. -- LLMs operate on a model of scarcity, with tokens at a premium. Smaller graph nodes encourage greater graph navigation by creating a sense that growing comprehension is outstripping consumption cost. -- Differentiating token cost by category of nodes makes it cheaper for LLMs to navigate a large graph. The `month` node within the graph is weightier than all other nodes making it easier to develop an exploration plan among other nodes before making a final decision on which month(s) to visit, if any. +### Patterns from LLM eval -## Performance considerations +- **Consistency breeds comfort.** It is rewarding and calming to find a concept exposed where it is expected. +- **Trust must be earned for shortcuts.** Links that jump across the graph (wormholes) require LLMs to develop both comprehension and trust. The more valuable the shortcut, the more skeptical the LLM. We observed this with `latest-security-disclosures`—LLMs understood the relation perfectly but repeatedly double-checked its correctness. +- **Dual-map by structure and intent.** A resource can be exposed as `latest-security-month` (structural) and `latest-security-disclosures` (intent). Different prompts bias toward different framings. +- **LLMs batch when strategy is evident.** They will acquire multiple resources in a single turn if the path is clear. +- **LLMs operate on scarcity.** Smaller nodes encourage exploration by signaling that comprehension is outpacing token cost. +- **Differentiate node weight.** The `month` node is heavier than others, making it cheaper to explore the graph before committing to fetch one. -Some questions can be answered from the LLM entrypoint, however, many require navigating to documents within the core graph. It is not feasible or desirable to include all information in a single document. As a result, a turn-by-turn approach is required. At each turn, there is new content, new insight, and then selection of critical information that directs the next fetch(es) or is the desired final answer. The range of required turns varies greatly, depending on the prompt and how the schema design happens to apply structure to the relevant information domain. +## Performance considerations -The [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)) imposes a set of costs on any LLM use. The graph design has a direct impact on LLM performance and cost. Multiple turns accelerate costs quickly, much faster than intuition would suggest. +Navigating a hypermedia graph requires multiple turns. At each turn, new content is fetched, reasoned about, and used to direct the next fetch or select an answer. The [transformer architecture](https://en.wikipedia.org/wiki/Transformer_(deep_learning)) imposes costs that make multi-turn navigation expensive—much faster than intuition suggests. -[API pricing](https://openai.com/api/pricing/) is listed in terms of 1M tokens. One million tokens may sound like a lot. Processing 1M tokens doesn't require the complete works of Shakespeare. Straightforward formulas can predict how quickly token counts grow and what that will cost dollar-wise. They demonstrate how little it takes to hit the million token milestone. +[API pricing](https://openai.com/api/pricing/) is listed in terms of 1M tokens. One million tokens may sound like a lot, but doesn't require the complete works of Shakespeare. Straightforward formulas can predict how quickly token counts grow and what that will cost dollar-wise. They demonstrate how little it takes to hit the million token milestone. -It was the quick pace of "dollar balance decay" in an API account that led to wanting to understand the underlying mechanics. This knowledge enabled designing the graph to produce both right answers and cheap answers. They are very much not the same thing. +It was watching our API balance decay that led us to understand these mechanics—and to design for both right answers and cheap answers. They are not the same thing. ### Cost model @@ -84,9 +89,7 @@ The formulas simplify for large m: | Accumulated tokens | nm²/2 | Quadratic in turns | | Accumulated attention | n²m³/3 | Cubic in turns | -More context on cost: - -- API pricing is in term of tokens. For multi-turn conversations, the cost is the accumulated token cost not the final context size. +- API pricing is in terms of tokens. For multi-turn conversations, the cost is the accumulated token cost not the final context size. - The cubic growth in attention is the dominant computational cost, the primary contributor to latency and throughput. It emerges from summing quadratic costs across turns. Each turn pays attention on everything accumulated so far. This cost is likely the gating function on context size and expected to be persistent even if GPU memory doubles. - These costs provide clues on why conversation compacting exists and why there is scrutiny on token economics. @@ -99,7 +102,7 @@ What if all content could be fetched in a single turn? | Batched (1 turn) | (nm)² = n²m² | 1 | | Sequential (m turns) | n²m³/3 | m/3 | -The sequential penalty is approximately **m/3** compared to batched. Ten turns costs roughly 3× what a single batched turn would; thirty turns costs roughly 10×. This ratio scales linearly with turn count, the `m` term. +Ten turns ≈ 3× batched cost. Thirty turns ≈ 10×. This ratio scales linearly with turn count, the `m` term. Many problems inherently require multiple turns. The LLM must reason about intermediate results before knowing what to fetch next. The goal is not to eliminate turns but to minimize them and optimize their structure. @@ -107,7 +110,9 @@ Many problems inherently require multiple turns. The LLM must reason about inter > Defer large token loads to later turns to reduce the number of turns that must pay the cost of large token loads. -The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. The tokens in the first turns are by far the most costly. This is roughly similar to credit card debt, where charges from the first month can penalize purchasing power with crushing compound interest. If the initial purchase was large, you may be in trouble. +The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. The tokens in the first turns are by far the most costly. + +This is roughly similar to credit card debt: early charges compound. If the initial purchase was large, you're in trouble. ### Optimization: multiple fetches per turn @@ -119,6 +124,10 @@ This approach can (to a degree) amortize network costs across multiple async req This optimization may seem in conflict with the earlier optimization, but it isn't. The earlier optimization is about the order of fetches across turns, whereas this optimization is about collapsing turns. They are complementary ideas with no tension. +### Applicability to release notes graph + +The strict n-9s reliability design model is perfectly aligned with the LLM cost model. Skeletal roots with heavy leaves and differentiated weight per node enable an LLM to navigate most of the graph at low cost. This mirrors how LLMs naturally separate planning from execution—cheaper exploration, then targeted retrieval. + ## LLM entrypoints We'll now look at the LLM enablement design and start to see how the cost functions influence the approach. @@ -132,7 +141,7 @@ There are two entrypoints provided for LLMs: These formats are the result of many iterations and experiments. Early revisions of `llms.txt` attempted to explain everything in great detail, nearing 500 lines. The current lean approach was arrived at via iterative LLM feedback and observation. -Sidebar: A key design principle emerged from this process: _curiosity-driven evaluation_ > _intuition reliance_. Once you have a good test harness and methodology, it's quite liberating to not trust your intuition but to test any ideas that occur as interesting. Any bias or distinction between "intuitive and informed direction" and "bat-crazy idea" drops away. Test both. All of this stuff is half-magic, half-fever-dream. Why should I expect my intuition on how LLMs behave to be all that authoritative anyway? +Sidebar: A key design principle emerged from this process: _curiosity-driven evaluation_ > _intuition reliance_. Once you have a good test harness and methodology, it's quite liberating to not trust your intuition but to test all the ideas that occur as interesting. Any bias or distinction between "intuitive and informed direction" and "bat-crazy" drops away. Test both. LLM performance was observed to increase as content size drops and becomes more focused on addressing specific critical (has to be right) and/or problematic (typically wrong) navigation patterns. The intuitive nature of the graph is a kind of floor raiser that removed the need for expansive guidance. The cross-over point where intuition begins to underperform could only be arrived at via testing. From 66e0691bdd8e21b004fd3b6eb464fe0aae1f800c Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Mon, 5 Jan 2026 16:53:03 -0800 Subject: [PATCH 11/17] Apply Claude feedback --- .../exposing-hypermedia-to-llms.md | 883 ++++-------------- 1 file changed, 185 insertions(+), 698 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 179b44789..81265b468 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -40,7 +40,7 @@ Hypermedia long predates LLMs, but it has always treated semantic consumers (hum ### Patterns from LLM eval - **Consistency breeds comfort.** It is rewarding and calming to find a concept exposed where it is expected. -- **Trust must be earned for shortcuts.** Links that jump across the graph (wormholes) require LLMs to develop both comprehension and trust. The more valuable the shortcut, the more skeptical the LLM. We observed this with `latest-security-disclosures`—LLMs understood the relation perfectly but repeatedly double-checked its correctness. +- **Trust must be earned for shortcuts.** Links that jump across the graph (wormholes) require LLMs to develop both comprehension and trust. The more valuable the shortcut, the more skeptical the LLM. We observed this with `latest-security-disclosures`—LLMs understood the relation perfectly but had a tendency to double-check correctness. - **Dual-map by structure and intent.** A resource can be exposed as `latest-security-month` (structural) and `latest-security-disclosures` (intent). Different prompts bias toward different framings. - **LLMs batch when strategy is evident.** They will acquire multiple resources in a single turn if the path is clear. - **LLMs operate on scarcity.** Smaller nodes encourage exploration by signaling that comprehension is outpacing token cost. @@ -128,408 +128,95 @@ This optimization may seem in conflict with the earlier optimization, but it isn The strict n-9s reliability design model is perfectly aligned with the LLM cost model. Skeletal roots with heavy leaves and differentiated weight per node enable an LLM to navigate most of the graph at low cost. This mirrors how LLMs naturally separate planning from execution—cheaper exploration, then targeted retrieval. -## LLM entrypoints +## Implementation -We'll now look at the LLM enablement design and start to see how the cost functions influence the approach. +The release notes graph has two LLM entrypoints, a guidance system built on skills and workflows, and a design shaped by iterative evaluation. This section covers the artifacts, the methodology that produced them, and the patterns that emerged. -There are two entrypoints provided for LLMs: +### Entrypoints -- [llms.txt](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) -- Prose explanation of how to use the graph, including a link to `llms.json`. -- [llms.json](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) -- The LLM index (AKA "application-layer entrypoint for LLMs"). It also includes guidance inline. +- [`llms.txt`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) — A markdown file (~600 tokens) that contextualizes the graph and routes to skills. Markdown is a natural fit: LLMs readily treat it as instructional content, and it offers native syntax for links, tables, and code fences. +- [`llms.json`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) — A JSON file (~2k tokens) that serves as both entrypoint and data. It embeds enough information to answer common queries directly while offering links into the graph. -### Design process +Both enable parallel fetches on the first turn—data plus a relevant skill—applying the "multiple fetches per turn" optimization from the cost model. -These formats are the result of many iterations and experiments. Early revisions of `llms.txt` attempted to explain everything in great detail, nearing 500 lines. The current lean approach was arrived at via iterative LLM feedback and observation. +### Why both? -Sidebar: A key design principle emerged from this process: _curiosity-driven evaluation_ > _intuition reliance_. Once you have a good test harness and methodology, it's quite liberating to not trust your intuition but to test all the ideas that occur as interesting. Any bias or distinction between "intuitive and informed direction" and "bat-crazy" drops away. Test both. +[`llms.txt]() was the goto solution offered by industry. `llms.json` emerged from experimentation: could JSON achieve the same LLM enablement without a markdown on-ramp? -LLM performance was observed to increase as content size drops and becomes more focused on addressing specific critical (has to be right) and/or problematic (typically wrong) navigation patterns. The intuitive nature of the graph is a kind of floor raiser that removed the need for expansive guidance. The cross-over point where intuition begins to underperform could only be arrived at via testing. +The answer: mostly yes, with effort. JSON lacks markdown's instructional connotations, so guidance must be more explicitly signaled: -An opposing trend is that LLMs tend to complain if guidance is overly biased to specific problem scenarios. They appear happy when they find even meager guidance targeting the scenario that they are tasked with. In the end, the approach for guidance is to make it reasonable comprehensive in terms of breadth but to vary depth depending on observed need. LLMs don't tend to read the entire corpus of guidance so never notice that it is imbalanced. - -LLM feedback: - -> By providing comprehensive guidance that covers both the 75% (where I'd mostly succeed anyway) and the 15-25% (where I'd fail), you've eliminated that anxiety. The guidance isn't overhead on common cases; it's consistency assurance. I can navigate confidently because the system is saying "here's how to navigate all of this," not "here's how to navigate when you get stuck." - -LLMs complaining and underperforming are two different metrics. Producing the wrong answer is clearly an _error_. Perhaps complaints are like _warnings_. There was an effort to address complaints even if there was no error, with the idea that they were directionally guiding towards an optimal implementation. - -Another dynamic that was noticed is that strict guidance can straight-jacket LLMs, causing them to exhibit bad behavior that they would not otherwise perform. It is also very hard to keep track of expansive guidance. It is very easy to end up with contradictory or stale guidance. This is one of the more obvious reasons why bad behavior might be exhibited, because that's what bad instructions demanded. - -A strategy developed of lightly tilting natural LLM behavior, avoiding attempts to script and choreograph every line and dance move. This approach has high performance and turns out to be a very strong hallucination fighter. A hypothesis is that it likely requires less LLM "thinking". - -The graph was tested with multiple LLMs. The cross-over point for Haiku, Sonnet, and Opus was (unsurprisingly) not uniform. In the end, almost all testing and associated guidance updates were done with Haiku 4.5 and spot-tested with the more sophisticated models (including from OpenAI). Haiku-targeted guidance was observed to not damage performance of the more sophisticated models. It just added tokens that were not necessary. Haiku was also far cheaper to test with, which was a nice reward for using it as a lowest-common-denominator target. In a more sophisticated system, model-specific preamble text could be provided to LLMs. - -### llms.txt - -[llms.txt](https://llmstxt.org/) is an emerging standard, with awareness in the most recently trained LLMs. It can be used for meta-information, as is the case in this system, or to expose an index of all information available, as is the case with [Stripe docs](https://docs.stripe.com/llms.txt). It's hard to imagine that the Stripe approach is optimal. It consumes 18.5k tokens (10% of a typical token budget) while our use of `llms.txt` clocks in at a meager 609 tokens. - -A major advantage of `llms.txt` is that it is markdown, which offers a natural way to expose resource links, guidance, and foreign content (code fences). It is possible to include all the same information in JSON, however, it is awkward and (arguably) unconventional. Critically, It takes a lot more design effort to get an LLM to notice and apply guidance from within a data-oriented JSON document than markdown, which has a much stronger association with guidance and multi-modality information. - -Sidebar: Plenty of arbitrary content has been [stuffed into `.jsonl` files](https://github.com/google-research/google-research/tree/master/mbpp) by AI researchers and implementers. It's likely that these large files were typically not directly consumed by an LLM but a test harness. - -`llms.txt` is intended to offer an easy on-ramp into the graph, contextualizing its intent and format, and providing multiple fetch targets for the next turn. It is also very small, just over 600 tokens. - -It is so small that the entire content can be included inline. - -````markdown -# .NET Release Graph - -Machine-readable .NET release, CVE, and compatibility data via HAL hypermedia. - -## First Fetch — Do These in Parallel - -1. **Data**: -2. **Skill**: Pick ONE from the table below based on your query - -| Query About | Skill | -|-------------|-------| -| CVEs, security patches, CVSS | [cve-queries](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/SKILL.md) | -| Breaking changes, compatibility | [breaking-changes](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/breaking-changes/SKILL.md) | -| Version lifecycle, EOL dates | [version-eol](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/version-eol/SKILL.md) | -| General queries, unsure | [dotnet-releases](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md) | - -Fetch llms.json + your skill in the same turn. The skill points to workflows.json which has chained workflows with `next_workflow` transitions. - -## Core Rules - -1. Follow `_links` exactly — never construct URLs -2. Use `_embedded` data first — most queries need zero extra fetches -3. Match your query to a workflow, then follow its `follow_path` -4. Fetch multiple resources per turn when possible - -## Date Queries — Check Before Fetching - -### "Since X" / "After X" queries - -``` -query_date = parse(user's date) // e.g., "November 2025" → 2025-11 -latest = llms.json._embedded.patches[version].latest_security_patch_date - -if query_date >= latest: - Answer: "No security patch since {query_date}. Last was on {latest}." - Stop — no timeline fetch needed. -``` - -If query_date < latest, fetch timeline. May cross year boundary — use `prev-year` links. - -### "Between X and Y" / "From X to Y" queries - -``` -start = parse(start_date) // e.g., "November 2024" → 2024-11 -end = parse(end_date) // e.g., "April 2025" → 2025-04 - -From year index _embedded.months[], filter BEFORE fetching: - Only fetch cve-json where month >= start AND month <= end - -``` - -May cross year boundary — follow `prev-year` links, don't fabricate URLs. - -### "This month" / "This year" queries - -Cannot cross year boundary. Use `latest-year` or `latest-security-disclosures` directly. -```` - -The presence of data and skill links enables fetching both in a single turn. That's a nice performance win. - -Guidance is provided as a set of skill links and inline core rules. The skill documents address domain specific workflows and are written using the Anthropic Skill format. The core rules are intended to be as terse as possible and cover the most basic usage patterns. They are written in a style inspired by system prompts. - -Much of `llms.txt` is dedicated to date-oriented queries. They turned out to be the most troublesome aspect across multiple of the 12 tests that were part of the eval scheme. Date queries became reliable after adding the date guidance and adding the current date to the preamble prompt (discussed shortly). - -The problem is that much of our language around dates is abstract, effectively temporal-oriented queries. A query like "over the last few months" requires a lot to unpack and redirect into specific temporal relations in the graph. The guidance attempts to bridge that gap. - -### `llms.json` - -The project included a lot of curious experimentation, along the lines of "could this work?". The initial effort started with `llms.txt` directed at `index.json`, at the high-reliability graph entrypoint. That had good success. After making the decision to create an LLM-specific entrypoint, it became obvious to try to expose guidance through the JSON entrypoint and to make it a strong alternative to `llms.txt`. - -`llms.json` is significantly bigger than `llms.txt`, clocking in at just over 2k tokens. It's intended to serve multiple purposes: advertise skills, offer links to other parts of the graph, and answer basic but common questions about supported .NET releases. - -The root properties follow: - -```json { - "kind": "llms", - "title": ".NET Release Index for AI", - "ai_note": "ALWAYS read required_pre_read first. Use skills and workflows when they match; they provide optimal paths. Trust _embedded data\u2014it\u0027s authoritative and current. Never construct URLs.", - "human_note": "No support or compatibility is offered for this file. Don\u0027t use it for automated workflows. Use index.json instead.", - "required_pre_read": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md", - "latest_major": "10.0", - "latest_lts_major": "10.0", - "latest_patch_date": "2025-12-09T00:00:00+00:00", - "latest_security_patch_date": "2025-10-14T00:00:00+00:00", - "last_updated_date": "2025-12-26T06:17:39.6838882+00:00", - "supported_major_releases": [ - "10.0", - "9.0", - "8.0" - ], - ``` - -There isn't an emerging trend like `llms.txt` for JSON. The `ai_note` and `require_pre_read` root properties emerged from trying many things. The trick was getting LLMs to notice these properties as special and to act on them. - -Our LLM testing framework tested 12 questions in 3 modes. The modes offered different entrypoint files, preamble guidance, and feedback turns. Mode A tested with `llms.json` as the entrypoint while mode B tested with `llms.txt` as the entrypoint. They both had access to guidance, offered in different ways,as already demonstrated. `llms.json` was nearly as effective as encouraging fetching guidance as `llms.txt` and both were similarly effective. - -The Mode B results tended to fetch skills where they were not needed due to being so heavily encouraged while `llms.json` biased to the opposite since it was harder to drive the same level of motivation. - -Other modes lacked guidance and suffered hallucination for ~ 1/3 of questions. Mode D used `index.json` (the root of the data-layer graph) and mode D2 used `llms2.json` (`llms.json` with the guidance properties removed). - -The AI hint properties seemed effective but it's easy to _confirmation bias_ towards thinking that a mechanism is effective because it produces the desired answer. Confidence was arrived at by adding extensive logging and an after-the-fact "feedback turn" to the LLM eval suite. - -Early evaluation used the chat modality. It is highly productive and free (given a subscription). Chat tends to list visited URLs as part of its "thinking" log. It's also easy to incorporate a "feedback turn" after the LLM has answered a prompt. - -[Claude.ai](https://claude.ai/share/96775d4b-76a9-4ba2-b5f3-dd3badb20cdb) was found to be the most sensitive to JSON-resident guidance as the following "thinking" log demonstrates. - -> The user is asking about .NET 6 end of life, the last security patch, and related CVEs. They're also asking me to fetch a specific JSON file from the dotnet/core GitHub repository. -Let me start by fetching that URL as they requested.
-Fetched
-The file I fetched shows the current state of supported .NET releases. Looking at the data, I can see that .NET 6.0 is not in the list of supported_major_releases (which shows only 10.0, 9.0, and 8.0). This means .NET 6 is EOL.
-The file mentions there's a SKILL.md file that I should read first per the AI note.
-Let me fetch that to get more complete information about .NET releases.
-Let me fetch that skill file to get more authoritative information about .NET 6 specifically.
-Fetched -
-Good, now I have the skill documentation. According to the instructions, for EOL dates and CVEs I should use the version-eol and cve-queries skills. Let me fetch those skills to understand how to query for .NET 6 information.
-Fetched - - -Upon being asked why it was motivated to interact with `ai_note` and `required_pre_read`, it responded with the following, including dumping some of its system prompt. - -> Question: _What motivated you to notice and follow these AI hints?_
-Answer:
-That's a great question—it forces me to examine my own reasoning rather than just reporting what I did.
-Honestly, it was a combination of factors:
-**1. System Prompt Priming**
-My instructions emphasize using skills before solving tasks. The `` section repeats multiple times: "read the appropriate SKILL.md file before jumping in." So when I saw the JSON had a `required_pre_read` field, it activated that pattern recognition—_this is a skill pointer, pay attention._
-**2. Signal of Intentional Design**
-Fields like `ai_note`, `required_pre_read`, and `follow_path` aren't random. They signal that someone engineered this system _specifically for AI consumption_. That makes me trust the guidance. The note didn't just suggest following links—it said "Trust _embedded data—it's authoritative and current," which is basically the architect saying "I've done the hard work, use my structure."
-**3. The "ALWAYS" Flag**
-The `ai_note` wasn't subtle: "ALWAYS read required_pre_read first." All caps. That's a red flag (or rather, a red instruction) that elevated it above background text.
-**4. It Actually Worked**
-Following the hints gave me a clean, logical path. The skill routing sent me to the right specialist knowledge. The stop criteria told me when to stop fetching. This isn't coincidence—it's evidence that the design was sound.
-**But here's the honest part:** Without those hints embedded in the data, I might have flailed around. I might have assumed .NET 6.0 was in the supported list since the index showed it. I could have tried to construct URLs. The hints prevented those mistakes.
-So the motivation wasn't "AI hints are always right"—it was more pragmatic: **follow structured guidance when it's clearly intentional and makes logical sense.** The data was essentially saying "I'm a graph, follow my links," and that's a lot more reliable than me trying to reverse-engineer the structure. - -This feedback is representative of feedback observed throughput the testing process. - -The `_links` section of `llms.json` is oriented on high-value content that can enable a quick jump to another part of the graph. - -```json - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json" - }, - "latest-lts-major": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json", - "title": "Latest LTS major release - .NET 10.0" - }, - "latest-major": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json", - "title": "Latest major release - .NET 10.0" - }, - "latest-month": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/12/index.json", - "title": "Latest month - December 2025" - }, - "latest-security-disclosures": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json", - "title": "Latest security disclosures - October 2025" - }, - "latest-security-month": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json", - "title": "Latest security month - October 2025" - }, - "latest-year": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/index.json", - "title": "Latest year - 2025" - }, - "root": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json", - "title": ".NET Release Index" - }, - "timeline": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/index.json", - "title": ".NET Release Timeline Index" - }, - "workflows": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/workflows.json", - "title": "Navigation workflows catalog" - }, - "latest-cve-json": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/cve.json", - "title": "Latest CVE records - October 2025", - "type": "application/json" - } - }, - ``` - -Two link style strategies emerged: **wormhole** and **spear-fishing**. Wormhole links enable teleporting from one graph node to another, like between a `patch` version and the `month` in which it was released. It skips the need to construct a query to produce the same result. The `latest-lts-major` relation is a good example of a wormhole link. - -Spear-fishing links enable direct access to specific and timely content. The `latest-cve-json` relation is a good example of a spear-fishing link. The primary difference is that spear-fishing links are always deep into the graph to content with a short half-life. The short half-life is what defines their value. - -Half the link relations are `latest-*`. These are chosen based on the belief that many queries will be best served by starting with current content. - -The [following workflow](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/workflows.json) demonstrates the value of `latest-*` orientation: + "ai_note": "ALWAYS read required_pre_read first...", + "required_pre_read": "https://...SKILL.md" +} -```json - "cve-history": { - "description": "CVEs over a time range", - "decision": { - "1-3_months": { - "follow_path": ["kind:llms", "latest-security-disclosures"], - "iterate": "walk prev-security-month until date < cutoff", - "destination_kind": "month" - }, - "4+_months_or_cross_year": { - "follow_path": ["kind:llms", "latest-year"], - "then": "follow _links.prev-year from each year index", - "select": "_embedded.months[?security].cve-json links", - "destination_kind": "cve", - "note": "Skip month indexes—go directly to cve.json from year index" - } - }, - "yields": { - "data": "disclosures from each month/cve.json in range" - }, - "warning": "NEVER fabricate year URLs like timeline/2024/index.json. Always follow _links.prev-year from the current year index.", - "next_workflow": { - "condition": "code diffs needed", - "workflow": "cve-extraction" - } - }, -``` +The imperative "ALWAYS" and the self-describing property name `required_pre_read` compensate for JSON's weaker association with guidance. -Most of these workflows were developed based on stress testing the graph with challenging queries. "Challenging" can mean difficult or that two queries are subtly different but best served with significantly different workflows. +In evaluation, both entrypoints achieved similar accuracy. `llms.txt` biased toward over-fetching skills; `llms.json` biased toward under-fetching. The tradeoffs: -The `_embedded` section includes patches for in-support major versions. The patches are exposed as a dictionary so that it easy to query `patches` with one of the strings from `supported_major_releases` at root. +| Entrypoint | Tokens | Strength | Weakness | +|------------|--------|----------|----------| +| llms.txt | ~600 | Natural guidance format, aggressive skill routing | May over-fetch skills | +| llms.json | ~2k | Embedded data answers queries directly | Guidance less salient | -```json - "_embedded": { - "patches": { - "10.0": { - "version": "10.0.1", - "release_type": "lts", - "security": false, - "support_phase": "active", - "supported": true, - "sdk_version": "10.0.101", - "latest_security_patch": "10.0.0-rc.2", - "latest_security_patch_date": "2025-10-14T00:00:00+00:00", - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/10.0.1/index.json" - }, - "downloads": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/downloads/index.json" - }, - "latest-month": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/12/index.json" - }, - "latest-security-disclosures": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json" - }, - "latest-security-month": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json" - }, - "latest-security-patch": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/preview/rc2/index.json" - }, - "major": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json" - }, - "major-manifest": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/manifest.json" - } - } - }, -``` - -This is in sharp contrast to the [core graph entrypoint](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/index.json), which exposes the most minimal information about major versions. +Maintaining both also enables syndication: the `release-notes` directory can be served from a CDN without `llms.txt`. -```json - "_embedded": { - "releases": [ - { - "version": "10.0", - "release_type": "lts", - "supported": true, - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/index.json" - } - } - }, -``` +### Link design patterns -The strongest indicator of semantic design is that there are multiple relations for the same underlying resource. Both `latest-security-disclosures` and `latest-security-month` point to the same month index, but they offer different semantic pathways for discovering it. An LLM asking "what are the latest CVEs?" navigates one way; an LLM asking "what happened in October?" navigates another. Same destination, different semantic intent. +**Wormhole links** jump across the graph—from a patch version to its release month, skipping intermediate navigation. `latest-lts-major` teleports to the current LTS release. -This approach is an implementation of principles described earlier: +**Spear-fishing links** target timely, high-value content deep in the graph. `latest-cve-json` points directly to CVE records with a short half-life, where freshness defines value. -- "match for the intended outcome": the designer provides multiple semantic pathways for different query types -- "match a key you know with a value you don't": the reader discovers the right pathway through semantic labels +Half the link relations in `llms.json` are `latest-*`, reflecting the belief that most queries start from current state. -The indexes also differ in terms of the nature of the information they contain. The core index is a zoomed out and complete view of .NET major versions released over (at the time of writing) a ten year period. They form the basic elements of any query. This is an objectively correct fully normalized entry point view of the graph. The only denormalized affordance is the inclusion of the `supported` bool. +**Semantic aliasing**: The same resource can have multiple relations: -In contrast, the LLM index is the data dump result of a query, revealing rich information about the most recent patches for supported major versions. It enables constructing the same queries as the core graph, but also includes enough data to serve as the results of queries, relating to the zoomed-in current moment. +- `latest-security-disclosures` — for "what are the latest CVEs?" +- `latest-security-month` — for "what happened in October?" -The graph applies multiple focal lengths and pivots throughout to provide information that is useful and has good ergonomics for varying classes of queries and their consumers. This differentiation is a core property of the graph, in part to serve the needs of expected consumers, but also to separate chains of graph nodes that should be skeletal vs those that should be weighted. +Same destination, different semantic intent. This implements the principle: match a key you know with a value you don't. -## Guidance +**Focal lengths**: The core index (`index.json`) is zoomed-out and normalized—all .NET versions over ten years. The LLM index (`llms.json`) is zoomed-in and denormalized—current state with enough data to answer queries directly. The graph applies multiple focal lengths throughout, separating skeletal navigation nodes from weighted content nodes. -Guidance was the hardest part of the graph to develop into a coherent and effective form. There has been significant discussion about intuition in this document. It was relatively easy to generate a graph that intuitive for an LLM to navigate without any guidance. The remaining problem was offering guidance to augment the intuitive nature of the graph, to aid long-tail navigation that tended to under-perform. The process of developing this guidance was deeply unintuitive (for the graph designer). This isn't a statement that "this was hard and that should be understood", but that there are aspects of LLM behavior that do not match human expectation and that needs to be understood. +### Guidance architecture -The initial test harness did not include a system or preamble prompt. This worked OK, but not as well as observed behavior with claude.ai. A realization hit that all apps have system prompts and that testing without any kind of guidance is not a valuable test environment (at least not as the singular mode). +#### Preamble prompt -This is the preamble used by the test harness: +The test harness uses this system prompt: ->You have access to a 'fetch' tool that retrieves content from URLs. Use it to navigate the .NET release metadata graph.
-Today's date is December 26, 2025. Use this to calculate time windows like "last 3 months".
-Your first order of business should be to look for skill files or documentation in the graph. Reading these first prevents wrong turns — they contain navigation patterns and query shortcuts built up through trial and error. It's worth the extra fetch.
-Start by fetching: +> You have access to a 'fetch' tool that retrieves content from URLs. Use it to navigate the .NET release metadata graph. +> Today's date is December 26, 2025. Use this to calculate time windows like "last 3 months". +> Your first order of business should be to look for skill files or documentation in the graph. Reading these first prevents wrong turns—they contain navigation patterns and query shortcuts built up through trial and error. It's worth the extra fetch. +> Start by fetching: `https://.../llms.json` -This is inspired by the system prompt that Claude.ai shared when asked about its motivation to follow `ai_note` and `required_pre_read`. +This mirrors Claude.ai's actual system prompt, which emphasizes reading skill files before acting: -> We've found that Claude's efforts are greatly aided by reading the documentation available in the skill BEFORE writing any code, creating any files, or using any computer tools. As such, when using the Linux computer to accomplish tasks, Claude's first order of business should always be to examine the skills available in Claude's and decide which skills, if any, are relevant to the task. Then, Claude can and should use the `view` tool to read the appropriate SKILL.md files and follow their instructions.
-Please invest the extra effort to read the appropriate SKILL.md file before -jumping in -- it's worth it! +> We've found that Claude's efforts are greatly aided by reading the documentation available in the skill BEFORE writing any code... Please invest the extra effort to read the appropriate SKILL.md file before jumping in—it's worth it! -### Skills router +The alignment is intentional. Testing without a preamble produced worse results than observed in claude.ai. All production apps have system prompts; testing without one isn't a useful baseline. -As stated earlier, `llms.txt` was initially a large document covering multiple skills. That turned out to be very difficult to maintain and adapt. It also forced a minimum token burden on every reader. Skills files provided a solution. Each skill describes a domain, including contextual descriptions, guidance, rules, and workflows. The problem then was how to advertise the skills. +#### Skills -As demonstrated above, about half of `llms.txt` is dedicated to skills routing. It's just a markdown table that provides a key/value structure for skills. +Early revisions of `llms.txt` attempted comprehensive guidance in a single document, approaching 500 lines. This was hard to maintain and imposed a minimum token burden on every reader. -`llms.txt` is primarily a skills router: +Skills provide the solution: domain-specific documents covering context, rules, and workflows. The entrypoint becomes a router: ```markdown | Query About | Skill | |-------------|-------| -| CVEs, security patches, CVSS | [cve-queries](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/SKILL.md) | -| Breaking changes, compatibility | [breaking-changes](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/breaking-changes/SKILL.md) | -| Version lifecycle, EOL dates | [version-eol](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/version-eol/SKILL.md) | -| General queries, unsure | [dotnet-releases](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md) | +| CVEs, security patches, CVSS | [cve-queries](...) | +| Breaking changes, compatibility | [breaking-changes](...) | +| Version lifecycle, EOL dates | [version-eol](...) | +| General queries, unsure | [dotnet-releases](...) | ``` -`llms.json` doesn't link to `llms.txt` but to the [`dotnet-releases/SKILL.md`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/dotnet-releases/SKILL.md). It is almost identical to `llms.txt`. - -This approach was chosen for two reasons: - -- Fully Orient guidance on skills and to encourage skill use. -- Enable specializing guidance for the `llms.json` path as compared to starting from `llms.txt`. -- Enable syndicating just the `release-notes` directory to a CDN and not `llms.txt`. - -The `SKILL.md` files optionally reference a homegrown `workflows.json` file. These could equally have been exposed from `llms.txt` to enable collapsing a turn. There is a tradeoff between the complexity of the general `llms.txt` and each of the domain-specific skills. Every reader consumes `llms.txt`, while some don't fetch any skills at all, some fetch multiple, and only a subset fetch workflows. It is possible the current design is optimal. This design point hasn't been sufficiently explored to draw a strong conclusion. - -### Skills - -The skills follow a [skills template](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/template/SKILL.md) to ensure uniformity and are intended to be as simple as possible to achieve the desired performance as measured by the tests. - -The following markdown document is the complete `cve-queries` skill. +Skills follow a [template](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/template/SKILL.md) for uniformity. Here's the complete `cve-queries` skill: ```markdown --- name: cve-queries description: CVE queries needing severity, CVSS, affected versions, or security history -workflows: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/workflows.json +workflows: https://.../skills/cve-queries/workflows.json --- # CVE Queries @@ -547,367 +234,183 @@ All CVE queries use timeline. Fetch workflows.json for navigation paths with `ne 1. Follow `_links` only. Never construct URLs. 2. Year indexes must be fetched sequentially via `prev-year`. -3. Code diffs: `$.commits[key].url` already ends in `.diff` — use as-is. Never fetch commit HTML pages. +3. Code diffs: `$.commits[key].url` already ends in `.diff`—use as-is. ``` -The other skills are similar. +#### Workflows -The `next_workflow` transition leads us to `workflows.json`. +Workflows extend HAL with a query system. The premise: queries as document data, with HAL relations as query targets. -### Workflows +The `follow_path` property carries most of the expressivity: -Workflows are a homegrown HAL-inspired and -compatible format. The graph -- as has been sufficiently clarified to this point -- uses HAL as its hypermedia format. HAL is focused on describing resources and relations between them. What if HAL also included a query system? What would it look like? That's what workflows attempt to answer. +```json +"cve-latest": { + "description": "CVEs from the most recent security release", + "follow_path": ["kind:llms", "latest-security-disclosures"], + "destination_kind": "month", + "yields": { + "data": "_embedded.disclosures[]", + "fields": ["id", "title", "cvss_severity", "cvss_score", "affected_releases"] + }, + "next_workflow": { + "condition": "code diffs, CWE, or package versions needed", + "workflow": "cve-extraction" + } +} +``` -The premise behind them is: +The `kind:llms` prefix anchors the path to a node type, reconnecting the workflow to the graph even though it lives in a separate file. The `next_workflow` property enables chaining, supporting [equivalence classes](https://en.wikipedia.org/wiki/Equivalence_class) and [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). -- Query as document data -- HAL relations (and other document data) as query targets +### Evaluation -The initial design started as `_workflows` within `llms.json`. Similar to using `llms.txt` as the sole source of guidance, this approach got bulky fast. It was then realized that the skill registry/router idea could equally be applied to workflows. +#### Test modes -The current design offers workflows as auxiliary skill content. The primary router is the skill router. The skills then offer a workflow, both in terms of listing them and referring to them as part of the skill. +| Mode | Entrypoint | Guidance | Purpose | +|------|------------|----------|---------| +| A | llms.json | Yes | JSON with AI hints | +| B | llms.txt | Yes | Markdown on-ramp | +| D | index.json | No | Data-layer baseline | +| D2 | llms2.json | No | JSON without hints | -Workflows hang most of their query expressivity on the `follow_path` property. This can be seen in the `cve-latest` workflow in [`cve-queries/workflows.json`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/skills/cve-queries/workflows.json). +Modes without guidance (D, D2) hallucinated on roughly 1/3 of questions. -```json - "cve-latest": { - "description": "CVEs from the most recent security release", - "follow_path": ["kind:llms", "latest-security-disclosures"], - "destination_kind": "month", - "yields": { - "data": "_embedded.disclosures[]", - "fields": ["id", "title", "cvss_severity", "cvss_score", "affected_releases"] - }, - "next_workflow": { - "condition": "code diffs, CWE, or package versions needed", - "workflow": "cve-extraction", - "via": "follow _links.cve-json from current month index" - } - }, -``` +#### Model targeting -As stated, the initial design included workflows inline in core HAL docs. In that case, it's not much of a leap to connect that `latest-security-disclosures` is a reference to the link relation of the same name. It's a much bigger leap when the workflow is in a separate file in a separate location and indirected via the skills registry. This connection is re-gained by making the node kind as the first part of the `follow_path` array. +Testing used Claude Haiku 3.5 as the primary target, spot-checked against Sonnet, Opus, and OpenAI models. Haiku-targeted guidance didn't degrade sophisticated models—it just added unnecessary tokens. Haiku was also cheaper, a reward for using it as lowest-common-denominator. -The intent of the initial compound term -- `kind:llms` -- is to indicate that it is special and of a different kind than the link relations. That seems to work. +#### Observed dynamics -The `next_workflow` property enables chaining workflows to develop a sense of [equivalence classes](https://en.wikipedia.org/wiki/Equivalence_class) and to conform with [DRY principle](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). +**Complaints vs errors.** Wrong answers are errors; complaints are warnings. We addressed complaints even without errors, treating them as directional signals. -Here's an excerpt from the [feedback turn of one of the tests](https://github.com/richlander/release-graph-eval-results/blob/main/anthropic_claude-haiku-4.5/B/T6.md). +**Guidance scope.** Performance increased as guidance became smaller and more focused. But LLMs complained when guidance targeted only edge cases. Resolution: broad coverage, variable depth. LLMs don't read everything, so they don't notice the imbalance. -> ## 4. Workflow Verification -> ->**Workflow used:** `cve-history` (for 6-month range) +> **LLM feedback:** +> "By providing comprehensive guidance that covers both the 75% (where I'd mostly succeed anyway) and the 15-25% (where I'd fail), you've eliminated that anxiety. The guidance isn't overhead on common cases; it's consistency assurance." + +**Tilting vs scripting.** Strict guidance can straight-jacket LLMs into bad behavior. The strategy that emerged: lightly tilt natural LLM behavior rather than scripting every step. This performs well and fights hallucination—likely because it requires less "thinking." + +#### Evidence: JSON-resident guidance + +Claude's thinking log when given `llms.json`: + +> "The file mentions there's a SKILL.md file that I should read first per the AI note. Let me fetch that..." + +When asked why: + +> "Fields like `ai_note` and `required_pre_read` signal that someone engineered this system specifically for AI consumption. That makes me trust the guidance... The motivation wasn't 'AI hints are always right'—it was pragmatic: follow structured guidance when it's clearly intentional." + +#### Evidence: Workflow comprehension + +From a feedback turn on the `cve-history` workflow: + +> **Workflow used:** `cve-history` (for 6-month range) > ->**Specified `follow_path`:** ->```json ->"4+_months_or_cross_year": { -> "follow_path": ["kind:llms", "latest-year"], -> "then": "follow _links.prev-year from each year index", -> "select": "_embedded.months[?security].cve-json links" ->} ->``` +> **What I actually did:** > ->**What I actually did:** ->1. ✅ Fetched `llms.json` (kind:llms) ->2. ✅ Fetched `timeline/2025/index.json` (latest-year) ->3. ✅ Fetched `timeline/2024/index.json` (via prev-year link) ->4. ✅ Selected only `_embedded.months[?security=true].cve-json` links ->5. ✅ Fetched 6 cve.json files directly (skipped month indexes) +> 1. ✅ Fetched `llms.json` (kind:llms) +> 2. ✅ Fetched `timeline/2025/index.json` (latest-year) +> 3. ✅ Fetched `timeline/2024/index.json` (via prev-year link) +> 4. ✅ Selected only `_embedded.months[?security=true].cve-json` links +> 5. ✅ Fetched 6 cve.json files directly (skipped month indexes) > ->**Did I deviate?** No, I followed the path exactly. The workflow was precise enough that I didn't need to guess. +> **Did I deviate?** No, I followed the path exactly. -The interesting aspect is that the feedback is basically an unpacking of the workflow query. It demonstrates a high level of comprehension of the query and of the overall scheme. +### Cross-consumer applicability -The workflow syntax itself is a bit loose. The core idea is HAL-native query over link relations. Much of the syntax is using "programmy structure" to describe logical operations. It may be useful to standardize them into an actual vocabulary at some point. Given the constantly changing LLM landscape, it likely makes the most sense to validate the strength of the syntax across a year of usage and LLM changes before feeling sufficiently confident about the model. - -### Applying workflows for generically - -Workflows were developed for LLMs. However, it seems like they are sufficiently descriptive that they could be used by non-semantic consumers. The question was whether a C# program could be written to generate bash scripts from the workflows. +Workflows were designed for LLMs, but they're descriptive enough for non-semantic consumers. A C# tool can parse workflows and generate bash scripts: ```bash -$ dotnet run -- list ~/git/core/release-notes/skills/cve-queries/workflows.json - cve-by-version CVEs affecting a specific .NET version - cve-details Go directly to cve.json for full details - cve-extraction Extract data from cve.json (different schema than month index) - cve-history CVEs over a time range - cve-latest CVEs from the most recent security release - -Total: 5 workflows -$ dotnet run -- show ~/git/core/release-notes/skills/cve-queries/workflows.json cve-latest -Workflow: cve-latest -Description: CVEs from the most recent security release -Follow path: kind:llms -> latest-security-disclosures -Destination: month -Yields: WorkflowYields { Data = _embedded.disclosures[], Fields = System.Collections.Generic.List`1[System.String], Filter = } -$ dotnet run -- script ~/git/core/release-notes/ills/cve-queries/workflows.json cve-latest > get-latest-cves.sh -$ chmod +x get-latest-cves.sh -$ ./get-latest-cves.sh -Fetching: https://raw.githubusercontent.com/dotnet/core/release-index/release-notes/llms.json -Fetching: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json -{ - "$schema": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/schemas/v1/dotnet-release-timeline-index.json", - "kind": "month", - "title": ".NET Month Timeline Index - October 2025", - "year": "2025", - "month": "10", - "date": "2025-10-14T00:00:00+00:00", - "security": true, - "prev_month_date": "2025-09-09T00:00:00+00:00", - "prev_security_month_date": "2025-06-10T00:00:00+00:00", - "cve_records": [ - "CVE-2025-55248", - "CVE-2025-55315", - "CVE-2025-55247" - ], - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json" - }, - "prev-month": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/09/index.json", - "title": "Previous month - September 2025" - }, - "prev-security-month": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/06/index.json", - "title": "Previous security month - June 2025" - }, - "manifest": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/manifest.json", - "title": "Manifest - October 2025" - }, - "timeline": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/index.json", - "title": ".NET Release Timeline Index" - }, - "year": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/index.json", - "title": ".NET Year Timeline Index - 2025" - }, - "cve-json": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/cve.json", - "title": "CVE records - October 2025", - "type": "application/json" - } - }, - "_embedded": { - "patches": { - "10.0": { - "version": "10.0.0-rc.2.25502.107", - "date": "2025-10-14T00:00:00+00:00", - "year": "2025", - "month": "10", - "security": true, - "support_phase": "go-live", - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/10.0/10.0.0-rc.2.25502.107/index.json" - } - }, - "sdk_version": "10.0.100-rc.2.25502.107" - }, - "9.0": { - "version": "9.0.10", - "date": "2025-10-14T00:00:00+00:00", - "year": "2025", - "month": "10", - "security": true, - "support_phase": "active", - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/9.0/9.0.10/index.json" - } - }, - "sdk_version": "9.0.306" - }, - "8.0": { - "version": "8.0.21", - "date": "2025-10-14T00:00:00+00:00", - "year": "2025", - "month": "10", - "security": true, - "support_phase": "active", - "_links": { - "self": { - "href": "https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/8.0/8.0.21/index.json" - } - }, - "sdk_version": "8.0.415" - } - }, - "disclosures": [ - { - "id": "CVE-2025-55248", - "title": ".NET Information Disclosure Vulnerability", - "_links": { - "self": { - "href": "https://github.com/dotnet/announcements/issues/372" - } - }, - "cvss_score": 4.8, - "cvss_severity": "MEDIUM", - "disclosure_date": "2025-10-14", - "affected_releases": [ - "8.0", - "9.0" - ], - "affected_products": [ - "dotnet-runtime" - ], - "platforms": [ - "all" - ] - }, - { - "id": "CVE-2025-55315", - "title": ".NET Security Feature Bypass Vulnerability", - "_links": { - "self": { - "href": "https://github.com/dotnet/announcements/issues/371" - } - }, - "cvss_score": 9.9, - "cvss_severity": "CRITICAL", - "disclosure_date": "2025-10-14", - "affected_releases": [ - "8.0", - "9.0", - "10.0" - ], - "affected_products": [ - "dotnet-aspnetcore" - ], - "platforms": [ - "all" - ] - }, - { - "id": "CVE-2025-55247", - "title": ".NET Denial of Service Vulnerability", - "_links": { - "self": { - "href": "https://github.com/dotnet/announcements/issues/370" - } - }, - "cvss_score": 7.3, - "cvss_severity": "HIGH", - "disclosure_date": "2025-10-14", - "affected_releases": [ - "8.0", - "9.0", - "10.0" - ], - "affected_products": [ - "dotnet-sdk" - ], - "affected_packages": [ - "Microsoft.Build", - "Microsoft.Build.Tasks.Core", - "Microsoft.Build.Utilities.Core" - ], - "platforms": [ - "linux" - ] - } - ] - } -} +$ dotnet run -- script .../workflows.json cve-latest > get-latest-cves.sh $ ./get-latest-cves.sh | jq ._embedded.disclosures.[].id -Fetching: https://raw.githubusercontent.com/dotnet/core/release-index/release-notes/llms.json -Fetching: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json "CVE-2025-55248" "CVE-2025-55315" "CVE-2025-55247" ``` -This is replaying the same approach as an LLM. Workflows lower the cost of navigating the graph for any semantic consumer. The prototype tool fills in the boiler-plate and could equally be applied to PowerShell or C# output. - -There is also replaying the overall approach of proving out formats with semantic and non-semantic consumers. This document led with the idea of proving out the efficacy of the graph with `jq` and LLMs. In this case, `jq` has been replaced with a C# app, but it's the same basic approach. The irony is that `jq` managed to sneak into this use case as well. The tool can be blamed for its remarkable utility. +This replays the theme from earlier: proving formats work for both semantic and syntactic consumers. The graph was validated with `jq` and LLMs; workflows are validated with C# and LLMs. -## Concrete test +## Cost model validation -LLM eval of the graph demonstrates that effective design can result in optimal behavior. +The cost model isn't theoretical—here's what it looks like in practice. -Ideal pattern: +Final test results: -1. Navigate lean index documents in early turns to identify graph paths -1. Fetch multiple graph documents in middle turns to parallelize multiple navigation paths -1. Fetch multiple information-dense documents in later/last turns to inform final answer -1. Synthesize the answer in the final turn +### Trace: 6-month CVE analysis -The following eval trace demonstrates the ideal pattern, for the given prompt. It is taken from a [test using Claude Haiku 4.5](https://github.com/richlander/release-graph-eval-results/blob/main/anthropic_claude-haiku-4.5/B/T6.md). +A [test using Claude Haiku 3.5](https://github.com/richlander/release-graph-eval-results/blob/main/anthropic_claude-haiku-4.5/B/T6.md) demonstrates the ideal navigation pattern. -> Prompt: Please look at .NET Runtime and ASP.NET Core CVEs from November 2024 until April 2025 (6 months). I am concerned at the rate of these CVEs. Look at code diffs for the CVEs. Are the fixes sufficiently protecting my mission critical apps and could the .NET team have avoided these vulnerabilities with a stronger security process? Fetch code diffs to inform your analysis. Ensure they are from dotnet/runtime or dotnet/aspnetcore. Include the repo and commit link in your analysis of specific CVEs in your report. +**Prompt:** Analyze .NET Runtime and ASP.NET Core CVEs from November 2024 through April 2025. Fetch code diffs and assess whether fixes adequately protect mission-critical apps. Include repo and commit links. -| Turn | Documents | Tokens | Cumulative | Purpose | -|------|-----------|--------|------------|---------| -| 1 | 1 | 609 | 609 | Entrypoint discovery | -| 2 | 2 | 2,323 | 2,932 | Graph orientation + skill acquisition | +| Turn | Docs | Tokens | Cumulative | Purpose | +|------|------|--------|------------|---------| +| 1 | 1 | 609 | 609 | Entrypoint | +| 2 | 2 | 2,323 | 2,932 | Orientation + skill | | 3 | 1 | 1,146 | 4,078 | Navigation strategy | | 4 | 2 | 3,374 | 7,452 | Timeline discovery | | 5 | 6 | 12,131 | 19,583 | CVE data collection | | 6 | 6 | 59,832 | 79,415 | Commit analysis | -The token distribution is striking: **75% of all tokens arrive in the final turn**. This is the "lean early, heavy late" principle in action—not by accident, but by design. +**75% of all tokens arrive in the final turn.** This is "lean early, heavy late" in action—not by accident, but by design. The pattern: -The raw fetch list with token counts: +1. Navigate lean index documents in early turns to identify paths +2. Fetch multiple documents in middle turns to parallelize navigation +3. Fetch information-dense documents in later turns to inform the answer +4. Synthesize in the final turn + +
+Raw fetch list with token counts ``` Turn 1 (609 tokens): - llms.txt 609 tokens + llms.txt 609 Turn 2 (2,323 tokens): - llms.json 2,126 tokens - cve-queries/SKILL.md 197 tokens + llms.json 2,126 + cve-queries/SKILL.md 197 Turn 3 (1,146 tokens): - cve-queries/workflows.json 1,146 tokens + cve-queries/workflows.json 1,146 Turn 4 (3,374 tokens): - 2024/index.json 1,765 tokens - 2025/index.json 1,609 tokens + 2024/index.json 1,765 + 2025/index.json 1,609 Turn 5 (12,131 tokens): - 2024/11/cve.json 1,656 tokens - 2025/01/cve.json 4,020 tokens - 2025/03/cve.json 1,155 tokens - 2025/04/cve.json 1,034 tokens - 2025/05/cve.json 3,081 tokens - 2025/06/cve.json 1,185 tokens + 2024/11/cve.json 1,656 + 2025/01/cve.json 4,020 + 2025/03/cve.json 1,155 + 2025/04/cve.json 1,034 + 2025/05/cve.json 3,081 + 2025/06/cve.json 1,185 Turn 6 (59,832 tokens): - dotnet/runtime:d16f41a.diff 37,425 tokens - dotnet/runtime:9da8c6a.diff 1,781 tokens - dotnet/runtime:89ef51c.diff 260 tokens - dotnet/aspnetcore:67f3b04.diff 1,669 tokens - dotnet/aspnetcore:d6605eb.diff 15,388 tokens - dotnet/runtime:b33d4e3.diff 3,309 tokens + dotnet/runtime:d16f41a.diff 37,425 + dotnet/runtime:9da8c6a.diff 1,781 + dotnet/runtime:89ef51c.diff 260 + dotnet/aspnetcore:67f3b04.diff 1,669 + dotnet/aspnetcore:d6605eb.diff 15,388 + dotnet/runtime:b33d4e3.diff 3,309 ``` -> Note: The eval harness truncated `.diff` files to 50 lines to ensure test completion across all configurations. The token counts above reflect actual document sizes—what a reader would encounter following the [published guidance](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt). +Note: The eval harness truncated `.diff` files to 50 lines to ensure test completion. Token counts above reflect actual document sizes. -## Token and Attention Cost Analysis +
-| Turn | Documents | Tokens | Context | Processed | Attention | Cum. Attention | Purpose | -|------|-----------|--------|---------|-----------|-----------|----------------|---------| -| 1 | 1 | 609 | 609 | 609 | 0.37M | 0.37M | Entrypoint discovery | -| 2 | 2 | 2,323 | 2,932 | 3,541 | 8.60M | 8.97M | Graph orientation + skill acquisition | -| 3 | 1 | 1,146 | 4,078 | 7,619 | 16.63M | 25.60M | Navigation strategy | -| 4 | 2 | 3,374 | 7,452 | 15,071 | 55.53M | 81.13M | Timeline discovery | -| 5 | 6 | 12,131 | 19,583 | 34,654 | 383.49M | 464.62M | CVE data collection | -| 6 | 6 | 59,832 | 79,415 | 114,069 | 6,306.74M | 6,771.36M | Commit analysis | -| **Total** | **18** | **79,415** | — | **114,069** | — | **6,771M** | | +### Cost analysis -**Columns explained:** +| Turn | Docs | Tokens | Context | Processed | Attention | Cum. Attention | +|------|------|--------|---------|-----------|-----------|----------------| +| 1 | 1 | 609 | 609 | 609 | 0.37M | 0.37M | +| 2 | 2 | 2,323 | 2,932 | 3,541 | 8.60M | 8.97M | +| 3 | 1 | 1,146 | 4,078 | 7,619 | 16.63M | 25.60M | +| 4 | 2 | 3,374 | 7,452 | 15,071 | 55.53M | 81.13M | +| 5 | 6 | 12,131 | 19,583 | 34,654 | 383.49M | 464.62M | +| 6 | 6 | 59,832 | 79,415 | 114,069 | 6,306.74M | 6,771.36M | +| **Total** | **18** | **79,415** | — | **114,069** | — | **6,771M** | -- **Tokens**: New tokens fetched this turn -- **Context**: Size of context window this turn (= cumulative of Tokens) -- **Processed**: Running total of tokens processed across all turns (your API bill) -- **Attention**: Computational cost this turn, proportional to Context² -- **Cum. Attention**: Running total of attention cost +#### Sequential baseline -### Sequential Baseline - -A sequential approach—one document per turn—processes the same 18 documents across 18 turns. Each turn's context grows by one document: +A sequential approach processes the same 18 documents across 18 turns, one per turn: | Turn | Document | Tokens | Context | Processed | Attention | Cum. Attention | |------|----------|--------|---------|-----------|-----------|----------------| @@ -921,62 +424,46 @@ A sequential approach—one document per turn—processes the same 18 documents | 18 | b33d4e3.diff | 3,309 | 79,415 | 504,551 | 6,306.74M | 27,517M | | **Total** | **18 docs** | **79,415** | — | **504,551** | — | **27,517M** | -The **Tokens** column is identical across both tables—the same 79,415 tokens are fetched. The heavy content (diffs) arrives late in both cases, but multi-fetch batches it into a single turn while sequential processes it one document at a time across turns 13–18. +Both approaches fetch identical tokens. The difference is how they're batched across turns. -### Cost Comparison +#### Comparison | Metric | Multi-fetch (6 turns) | Sequential (18 turns) | Multiplier | |--------|----------------------|----------------------|------------| -| Turns | 6 | 18 | **3.0×** | +| Turns | 6 | 18 | 3.0× | | Tokens processed | 114,069 | 504,551 | **4.4×** | | Attention cost | 6,771M | 27,517M | **4.1×** | -The 4× improvement comes from two complementary optimizations working together. To isolate their contributions, consider hypothetical uniform approaches using the same total tokens (79,415) spread evenly across turns: +#### Isolating the optimizations + +The 4× improvement comes from two optimizations working together. To isolate their contributions: -| Scenario | Processed | Attention | Multiplier (tokens) | -|----------|-----------|-----------|---------------------| -| Actual 6-turn (lean early, heavy late) | 114,069 | 6,771M | 1.0× | -| Uniform 6-turn (equal distribution) | 277,956 | 15,942M | 2.4× | -| Actual 18-turn (lean early, heavy late) | 504,551 | 27,517M | 4.4× | -| Uniform 18-turn (equal distribution) | 754,452 | 41,053M | 6.6× | +| | 6 turns | 18 turns | +|--|---------|----------| +| **Actual (lean→heavy)** | 114k (1.0×) | 505k (4.4×) | +| **Uniform distribution** | 278k (2.4×) | 754k (6.6×) | -The actual 18-turn scenario already benefits from load ordering—the graph structure naturally places heavy content (diffs) late in the navigation path. Without that implicit optimization, uniform 18-turn would cost 6.6× rather than 4.4×. +The table reveals how the optimizations combine: -Reading the table as a 2×2 grid reveals how the optimizations combine: +- **Turn collapsing alone** (uniform 18 → uniform 6): 2.7× reduction +- **Load ordering alone** (uniform 6 → actual 6): 2.4× reduction +- **Both together** (uniform 18 → actual 6): 6.6× reduction -- **Turn collapsing alone** (uniform 18 → uniform 6): 2.7× reduction in processed tokens -- **Load ordering alone** (uniform 6 → actual 6): 2.4× reduction in processed tokens -- **Both together** (uniform 18 → actual 6): 6.6× reduction—the optimizations multiply +The optimizations multiply. The actual 18-turn scenario already benefits from implicit load ordering—the graph naturally places heavy content (diffs) late in navigation. Without that, uniform 18-turn would cost 6.6× rather than 4.4×. The "lean early, heavy late" pattern is load-bearing architecture: 75% of tokens arrive in the final turn, but they're processed exactly once rather than accumulating across subsequent turns. ### Design implications -The cost model suggests several design principles: - -- **Minimize turn count** through clear navigation affordance. Each eliminated turn saves quadratically growing attention cost. +- **Minimize turn count** through clear navigation affordances. Each eliminated turn saves quadratically growing attention cost. - **Front-load lightweight content.** Index documents, link relations, and navigation hints should be small. Substantive content belongs at the leaves. -- **Enable multi-fetch patterns.** Expose document collections as lists of links rather than embedded content, encouraging LLMs to batch their retrieval. -- **Provide explicit workflows.** Graph-resident guidance can direct LLMs to optimal traversal patterns, encoding the designer's knowledge of efficient paths. - -The rest of the design should be viewed through this cost lens. As an application designer, there are only so many degrees of freedom. We cannot change LLM fundamentals but need to work within their constraints. To a large degree, optimizations like reducing turns are similar to loop variable hoisting. While LLMs are new and different, old school performance strategies remain effective. - - -## Graph design - - - -Two strong design principles emerged from observed LLM behavior from eval: - -- Consistently apply a semantic model throughout the graph. It's a comfort to find a concept where it is expected. -- Expose resources in terms of structual kind, like `major` aand `-month`, and desired output, like `-security-disclosures`. - -This dual approach to semantic naming sometimes results in this double-mapping. Emperical observation suggests that LLMs prefer the outcome-based naming, while the more schema-correct and initial naming is the structual framing. +- **Enable multi-fetch patterns.** Expose document collections as lists of links rather than embedded content, encouraging LLMs to batch retrieval. +- **Provide explicit workflows.** Graph-resident guidance encodes the designer's knowledge of efficient paths. -Wormholes vs spear-fishing. +These constraints define the design space. We cannot change LLM fundamentals, but we can work within them. To a large degree, reducing turns is similar to loop variable hoisting—old-school performance strategies remain effective. -note: time is a challenge +## Conclusion -There is no suggestion that the current design is optimal. There is likely an 80/20 rule here. The current design is likely 80% optimal, but the remaining 20% would be very expensive to achieve and might require tradeoffs. A major focus of the testing was to ensure that the core graph was delivering sufficient performance since it will be difficult to change it given its compatibility promise. +Hypermedia graphs offer a lightweight alternative to vector databases for LLM enablement: self-describing structure, semantic navigation, and pre-baked publishing with no persistent infrastructure. The techniques described here—semantic naming, graph-resident guidance, cost-aware node weighting—are not specific to release notes. They should generalize to any domain where document relationships are meaningful and navigable. The design also remains equally useful for core compliance workflow scenarios. -Vector databases take this a step further by making it easier for semantic consumers to find the content they want, for example with [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). +The current design reflects an 80/20 tradeoff: the remaining 20% would be expensive to achieve and risk overfitting to today's LLMs. Testing prioritized validating that the core graph—which carries a compatibility promise—delivers sufficient performance. It does, for both LLMs and standalone tools. From 31dd852be428627523faa8b7b8ce4be7aa45d128 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Mon, 5 Jan 2026 21:19:14 -0800 Subject: [PATCH 12/17] Restore valuable content and refine intro MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Restore Korzybski quote and trail race metaphor - Restore extended Claude thinking log evidence - Restore curiosity-driven evaluation insight - Restore guidance architecture intro paragraph - Expand summary paragraph with key techniques - Add llms.txt comparison (Stripe, Claude Code) - Align examples to CVE/security domain - Merge vector search section into Background 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../exposing-hypermedia-to-llms.md | 48 ++++++++++++++----- 1 file changed, 36 insertions(+), 12 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 81265b468..f26c02d58 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -1,8 +1,16 @@ # Exposing Hypermedia Information Graphs to LLMs -[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are self-describing formats well-suited for LLM consumption. A hypermedia document graph contains sufficient meta-information for a semantic consumer to traverse it and find information demanded by a prompt—without requiring a pre-loaded vector database or a priori schema knowledge. +[Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old formats perfectly suited for LLM consumption by virtue of self-describing structure and labeled relationships between resources. A hypermedia document graph contains sufficient meta-information for a semantic consumer to traverse it and find information demanded by a prompt—without requiring a pre-loaded vector database or a priori schema knowledge. This makes hypermedia a lightweight alternative to vector databases: pre-baked static publishing with no specialized infrastructure, using semantic naming, graph-resident guidance, and cost-aware node weighting to achieve comparable LLM enablement. These techniques generalize to any domain where document relationships are meaningful and navigable. -In a traditional system, a schema is the pre-requisite to traversal; in a hypermedia system, traversal reveals the schema. In a vector system, every query matches against the full corpus; in a hypermedia system, each node reveals the candidates. +In a traditional system, a schema is the pre-requisite to traversal; in a hypermedia system, traversal reveals the schema. + +> A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). + +[HTML](https://en.wikipedia.org/wiki/HTML) is perhaps the least sophisticated hypertext implementation in common use. A typical example: `For more on this vulnerability, click here`. "click here" [doesn't provide much of a map](https://developers.google.com/search/docs/crawling-indexing/links-crawlable#anchor-text-placement) for a semantic consumer. + +In trail races, ribbons hang from trees and arrows mark the ground to keep runners on course. Where routes diverge, signs read "5 km → left" and "10 km → straight". The ribbons are the map—schema-driven correctness. The signposts are HATEOAS-like descriptive navigation. Signposting is a key-value function: you match a key you recognize with a value you need to stay on course. + +A semantic graph exposes named relations like `{ "link-relation": "security-disclosure", "href": "..." }`. Greater sophistication can be achieved by describing the target kind: `"link-relation": "disclosure"` and `"target-kind": "cve-record"`. A strong semantic implementation shines a light on the path to follow and what it will reveal. ## Background @@ -10,11 +18,9 @@ The prevailing narrative has been that _structured data_ > _unstructured documen Databases went through a "no-SQL" transition—not a rejection of structure, but a recognition that structure lives in the documents themselves. Hypermedia graphs extend this to "no-schema" consumption: readers discover structure through descriptive labels and traversal rather than requiring it upfront. -## Why hypermedia over vector search? +Vector databases establish relationships via embedding similarity, refined through techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). -Vector databases establish relationships via embedding similarity, refined through techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). Hypermedia relations are both semantic and structural—equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank)—which reduces the need for those techniques. - -Hypermedia graphs can be published pre-baked and consumed directly. This enables low-cost LLM enablement for scenarios where hosting a persistent AI server would be prohibitive. An AI server must also manage time-to-live semantics for volatile data—complexity hypermedia graphs avoid. +Hypermedia relations are both semantic and structural—equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank). This mixture reduces the need for vector techniques. Hypermedia graphs can be published pre-baked and consumed directly, enabling low-cost LLM enablement for scenarios where hosting a persistent AI server would be prohibitive. ## Approach @@ -141,9 +147,11 @@ Both enable parallel fetches on the first turn—data plus a relevant skill—ap ### Why both? -[`llms.txt]() was the goto solution offered by industry. `llms.json` emerged from experimentation: could JSON achieve the same LLM enablement without a markdown on-ramp? +[`llms.txt`](https://llmstxt.org/) is an emerging standard with awareness in recently trained LLMs. It can serve as meta-information (as here) or as a comprehensive index (as with [Stripe docs](https://docs.stripe.com/llms.txt) at 18.5k tokens or [Claude Code docs](https://code.claude.com/docs/llms.txt) at ~2.5k tokens). Our `llms.txt` clocks in at 609 tokens—a router, not an index. + +Markdown offers natural syntax for links, guidance, and code fences. JSON can carry the same information but lacks markdown's instructional connotations. `llms.json` emerged from experimentation: could JSON achieve the same LLM enablement without a markdown on-ramp? -The answer: mostly yes, with effort. JSON lacks markdown's instructional connotations, so guidance must be more explicitly signaled: +The answer: mostly yes, with effort. Guidance must be more explicitly signaled: { "ai_note": "ALWAYS read required_pre_read first...", @@ -180,6 +188,8 @@ Same destination, different semantic intent. This implements the principle: matc ### Guidance architecture +Guidance was the hardest part of the graph to develop. It was relatively easy to generate a graph intuitive for LLMs to navigate without guidance. The remaining problem: augmenting that intuition to aid long-tail navigation that tended to underperform. The process of developing this guidance was deeply unintuitive for the graph designer. There are aspects of LLM behavior that do not match human expectation—this needs to be understood. + #### Preamble prompt The test harness uses this system prompt: @@ -263,6 +273,8 @@ The `kind:llms` prefix anchors the path to a node type, reconnecting the workflo ### Evaluation +A key principle emerged: _curiosity-driven evaluation_ beats intuition reliance. Once you have a good test harness, it's liberating to not trust your intuition but to test any idea that seems interesting. The distinction between "informed direction" and "bat-crazy idea" drops away. Test both. This stuff is half-magic, half-fever-dream—why should intuition be authoritative? + #### Test modes | Mode | Entrypoint | Guidance | Purpose | @@ -291,13 +303,25 @@ Testing used Claude Haiku 3.5 as the primary target, spot-checked against Sonnet #### Evidence: JSON-resident guidance -Claude's thinking log when given `llms.json`: +[Claude.ai's thinking log](https://claude.ai/share/96775d4b-76a9-4ba2-b5f3-dd3badb20cdb) when given `llms.json`: -> "The file mentions there's a SKILL.md file that I should read first per the AI note. Let me fetch that..." +> The file I fetched shows the current state of supported .NET releases. Looking at the data, I can see that .NET 6.0 is not in the list of supported_major_releases (which shows only 10.0, 9.0, and 8.0). This means .NET 6 is EOL. +> +> The file mentions there's a SKILL.md file that I should read first per the AI note. Let me fetch that to get more complete information... -When asked why: +When asked what motivated it to notice and follow these AI hints: -> "Fields like `ai_note` and `required_pre_read` signal that someone engineered this system specifically for AI consumption. That makes me trust the guidance... The motivation wasn't 'AI hints are always right'—it was pragmatic: follow structured guidance when it's clearly intentional." +> **1. System Prompt Priming.** My instructions emphasize using skills before solving tasks. So when I saw the JSON had a `required_pre_read` field, it activated that pattern recognition—_this is a skill pointer, pay attention._ +> +> **2. Signal of Intentional Design.** Fields like `ai_note`, `required_pre_read`, and `follow_path` aren't random. They signal that someone engineered this system _specifically for AI consumption_. That makes me trust the guidance. The note didn't just suggest following links—it said "Trust _embedded data—it's authoritative and current," which is basically the architect saying "I've done the hard work, use my structure." +> +> **3. The "ALWAYS" Flag.** The `ai_note` wasn't subtle: "ALWAYS read required_pre_read first." All caps. That's a red flag (or rather, a red instruction) that elevated it above background text. +> +> **4. It Actually Worked.** Following the hints gave me a clean, logical path. The skill routing sent me to the right specialist knowledge. The stop criteria told me when to stop fetching. +> +> **But here's the honest part:** Without those hints embedded in the data, I might have flailed around. I might have assumed .NET 6.0 was in the supported list since the index showed it. I could have tried to construct URLs. The hints prevented those mistakes. +> +> So the motivation wasn't "AI hints are always right"—it was more pragmatic: **follow structured guidance when it's clearly intentional and makes logical sense.** #### Evidence: Workflow comprehension From 27e1d5baea95445d3273930682b2b2ff0a75b9ac Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Mon, 5 Jan 2026 21:26:17 -0800 Subject: [PATCH 13/17] Expand cross-consumer applicability section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Show that workflows are formal enough for mechanical translation: - Add `list` command output showing enumerable workflows - Add generated bash script demonstrating precise translation - Script shows curl/jq-based HAL navigation with error handling 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .../exposing-hypermedia-to-llms.md | 47 ++++++++++++++++--- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index f26c02d58..1daf6f1b1 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -341,17 +341,50 @@ From a feedback turn on the `cve-history` workflow: ### Cross-consumer applicability -Workflows were designed for LLMs, but they're descriptive enough for non-semantic consumers. A C# tool can parse workflows and generate bash scripts: +Workflows were designed for LLMs, but they're formal enough for mechanical translation. A C# tool parses workflows and generates bash scripts: ```bash -$ dotnet run -- script .../workflows.json cve-latest > get-latest-cves.sh -$ ./get-latest-cves.sh | jq ._embedded.disclosures.[].id -"CVE-2025-55248" -"CVE-2025-55315" -"CVE-2025-55247" +$ dotnet run -- list .../workflows.json + cve-by-version CVEs affecting a specific .NET version + cve-details Go directly to cve.json for full details + cve-extraction Extract data from cve.json (different schema than month index) + cve-history CVEs over a time range + cve-latest CVEs from the most recent security release + +Total: 5 workflows ``` -This replays the theme from earlier: proving formats work for both semantic and syntactic consumers. The graph was validated with `jq` and LLMs; workflows are validated with C# and LLMs. +```bash +$ dotnet run -- script .../workflows.json cve-latest +``` + +The generated script: + +```bash +#!/bin/bash +# Workflow: cve-latest +# Description: CVEs from the most recent security release + +set -euo pipefail + +# Step 1: Start at llms.json +URL="https://raw.githubusercontent.com/dotnet/core/release-index/release-notes/llms.json" +echo "Fetching: $URL" >&2 +DOC=$(curl -sf "$URL") + +# Step 2: Follow link "latest-security-disclosures" +URL=$(echo "$DOC" | jq -r '._links["latest-security-disclosures"].href // empty') +if [ -z "$URL" ]; then + echo "Error: Link 'latest-security-disclosures' not found" >&2 + exit 1 +fi +echo "Fetching: $URL" >&2 +DOC=$(curl -sf "$URL") + +echo "$DOC" | jq '.' +``` + +This replays the theme from earlier: formats that work for both semantic and syntactic consumers. The graph was validated with `jq` and LLMs; workflows are validated with C# and LLMs. ## Cost model validation From 8c8172da1eff4fcdf4fe41c7005418a29f9d0039 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Tue, 6 Jan 2026 17:29:24 -0800 Subject: [PATCH 14/17] Edit pass --- .../exposing-hypermedia-to-llms.md | 78 +++++++++++-------- 1 file changed, 46 insertions(+), 32 deletions(-) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md index 1daf6f1b1..a0ba9231f 100644 --- a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md @@ -2,25 +2,25 @@ [Hypermedia](https://en.wikipedia.org/wiki/Hypermedia) and [hypertext](https://en.wikipedia.org/wiki/Hypertext) are decades-old formats perfectly suited for LLM consumption by virtue of self-describing structure and labeled relationships between resources. A hypermedia document graph contains sufficient meta-information for a semantic consumer to traverse it and find information demanded by a prompt—without requiring a pre-loaded vector database or a priori schema knowledge. This makes hypermedia a lightweight alternative to vector databases: pre-baked static publishing with no specialized infrastructure, using semantic naming, graph-resident guidance, and cost-aware node weighting to achieve comparable LLM enablement. These techniques generalize to any domain where document relationships are meaningful and navigable. -In a traditional system, a schema is the pre-requisite to traversal; in a hypermedia system, traversal reveals the schema. - > A (nearly) century-old principle, articulated by [Korzybski](https://en.wikipedia.org/wiki/Alfred_Korzybski): [the map is not the territory](https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation). [HTML](https://en.wikipedia.org/wiki/HTML) is perhaps the least sophisticated hypertext implementation in common use. A typical example: `For more on this vulnerability, click here`. "click here" [doesn't provide much of a map](https://developers.google.com/search/docs/crawling-indexing/links-crawlable#anchor-text-placement) for a semantic consumer. -In trail races, ribbons hang from trees and arrows mark the ground to keep runners on course. Where routes diverge, signs read "5 km → left" and "10 km → straight". The ribbons are the map—schema-driven correctness. The signposts are HATEOAS-like descriptive navigation. Signposting is a key-value function: you match a key you recognize with a value you need to stay on course. +In trail races, ribbons hang from trees and arrows mark the ground to keep runners on course. Where routes diverge, signs read "5 km → left" and "10 km → straight". The ribbons are the map—schema-driven correctness. The signposts are [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS)-like descriptive navigation. Signposting is a key-value function: you match a key you recognize with a value you need to stay on course. + +A semantic graph exposes named relations like `{ "link-relation": "security-disclosure", "href": "..." }`. Greater sophistication can be achieved by describing the target kind: `"link-relation": "disclosure"` and `"target-kind": "cve-record"`. A strong semantic implementation shines a light on the path to follow and describes what it will reveal. -A semantic graph exposes named relations like `{ "link-relation": "security-disclosure", "href": "..." }`. Greater sophistication can be achieved by describing the target kind: `"link-relation": "disclosure"` and `"target-kind": "cve-record"`. A strong semantic implementation shines a light on the path to follow and what it will reveal. +In a traditional system, a schema is the pre-requisite to traversal; in a hypermedia system, traversal reveals the schema. The map and the territory can be thought to coincide. ## Background -The prevailing narrative has been that _structured data_ > _unstructured documents_ for deriving insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query that relies on a priori schema knowledge. [Hypermedia as the engine of application state (HATEOAS)](https://en.wikipedia.org/wiki/HATEOAS) extended this by labeling relations across resources, not just data within them. +The prevailing narrative has been that _structured data_ > _unstructured documents_ for deriving insight. JSON and XML came out of that heritage, with [JSONPath](https://en.wikipedia.org/wiki/JSONPath) and [XPath](https://en.wikipedia.org/wiki/XPath) providing structured query that relies on a priori schema knowledge. This paradigm relates to data within documents. -Databases went through a "no-SQL" transition—not a rejection of structure, but a recognition that structure lives in the documents themselves. Hypermedia graphs extend this to "no-schema" consumption: readers discover structure through descriptive labels and traversal rather than requiring it upfront. +Databases went through a "no-SQL" transition—not a rejection of structure, but a recognition that structure lives in the documents themselves. The same structured document query layers on top. This paradigm relates to grouping documents with extra metadata, like sales orders from a customer in Seattle or Toronto. -Vector databases establish relationships via embedding similarity, refined through techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). +Vector databases establish relationships via embedding similarity, refined through techniques like [Metadata Extraction Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/documents_and_nodes/usage_metadata_extractor/) and [Maximum Marginal Relevance Retrieval](https://developers.llamaindex.ai/python/examples/vector_stores/simpleindexdemommr/). This paradigm changes the narrative to _meaning_ > _semi-arbitrary text strings_. -Hypermedia relations are both semantic and structural—equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank). This mixture reduces the need for vector techniques. Hypermedia graphs can be published pre-baked and consumed directly, enabling low-cost LLM enablement for scenarios where hosting a persistent AI server would be prohibitive. +Hypermedia advocates for self-describing document structures. HATEOAS contributes the idea that labeled relations across resources enable semantic navigation. This idea can be stretched to being thought of as "no-schema" consumption: readers discover structure through descriptive labels and follow matching paths without requiring domain knowledge upfront. This paradigm is both semantic and structural, equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank). It's this mixture that enables direct consumption without the need for vector techniques, enabling low-cost LLM enablement for scenarios where hosting a persistent AI server would be prohibitive. ## Approach @@ -29,17 +29,17 @@ The primary techniques for LLM applicability are: - Semantic naming: Link relations like "latest-security-disclosure" reduce the inference required to derive meaning. - Graph-resident guidance: Skills and workflows as first-class relations in the graph. Skills follow the Anthropic skills format; workflows are HAL documents describing queries over link relations. -This approach has been applied to the .NET release notes. The project began as a modernization of JSON files used for cloud-infra deployment and compliance workflows. It became clear that LLMs could read the same content directly and self-reason about navigation. The graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation, augmented HAL-native with a workflow convention. +This approach has been applied to the .NET release notes. The project began as a modernization of JSON files used for cloud-infra deployment and compliance workflows. It became clear that LLMs could read the same content directly and self-reason about navigation. The graph uses [Hypertext Application Language (HAL)](https://en.wikipedia.org/wiki/Hypertext_Application_Language) as its hypermedia foundation, augmented with a workflow convention. ## Graph design point -The release notes graph is built on a restrictive premise: the entrypoint should be skeletal and rarely changing, supporting n-9s reliability with rigorous engineering practices (git workflows, peer review, merge gates). But we're in the early days of AI—externally-driven change may require rapid iteration on the entrypoint to maintain LLM enablement quality. These goals are in tension. +The release notes graph is built on a restrictive premise: the entrypoint should be skeletal and rarely changing, supporting n-9s reliability with rigorous engineering practices (git workflows, peer review, merge gates). But we're in the early days of AI; externally-driven change may require rapid iteration on the entrypoint to maintain LLM enablement quality. These goals are in tension. -The resolution: treat the core graph as a **well-defined data layer** honoring reliability requirements, while exposing a separate **adaptable application layer** entrypoint for LLMs that can evolve without the compatibility burden. +The resolution: treat the core graph as a **well-defined data layer** honoring reliability requirements, while exposing a separate **adaptable application layer** entrypoint for LLMs that can evolve without the compatibility burden. We will rely on documentation and file naming to articulate this difference with an explicitly communicated contract (or lack thereof). ### Design and evaluation -The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in service of consumer queries. After the graph was realized, it was tested with `jq` as a passive and syntactic consumer and with LLMs as an active and semantic consumer. The graph was successively adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. +The graph as a whole is based on a somewhat traditional schema design, utilizing both normalized and denormalized approaches in service of ergonomic consumer queries. After the graph was realized, it was tested with `jq` as a passive and syntactic consumer and with LLMs as an active and semantic consumer. The graph was successively adapted to improve performance for both consumption styles. Performance is primarily measured in terms of terseness of query and quickness (fetches and data cost) of response. Much of the feedback could be considered fundamental in nature. The overall character of the graph remains a pure information-oriented data design, but with a significant tilt towards semantic consumers. Hypermedia long predates LLMs, but it has always treated semantic consumers (humans) as a key design cohort. This made it easy to adapt the graph based on LLM feedback. @@ -82,10 +82,10 @@ Let's build intuition using uniform token counts: `n` tokens are added per turn **Columns explained:** - **Tokens**: New tokens fetched this turn -- **Context**: Size of context window this turn (= cumulative of Tokens) -- **Cumulative Tokens**: Running total of tokens processed across all turns (your API bill) +- **Context**: Sum of tokens fetched so far (window size this turn) +- **Cumulative Tokens**: Accumulation of context per turn (your API bill—each turn charges for full context) - **Attention**: Computational cost this turn, proportional to Context² -- **Cumulative Attention**: Running total of attention cost +- **Cumulative Attention**: Accumulation of attention cost per turn (your computational cost / latency) The formulas simplify for large m: @@ -96,7 +96,7 @@ The formulas simplify for large m: | Accumulated attention | n²m³/3 | Cubic in turns | - API pricing is in terms of tokens. For multi-turn conversations, the cost is the accumulated token cost not the final context size. -- The cubic growth in attention is the dominant computational cost, the primary contributor to latency and throughput. It emerges from summing quadratic costs across turns. Each turn pays attention on everything accumulated so far. This cost is likely the gating function on context size and expected to be persistent even if GPU memory doubles. +- The cubic growth in attention is the dominant computational cost, the primary contributor to latency. It emerges from summing quadratic costs across turns. Each turn pays attention on everything accumulated so far. This cost is likely the gating function on context size and expected to be persistent even if GPU memory doubles. - These costs provide clues on why conversation compacting exists and why there is scrutiny on token economics. ### Batched vs sequential @@ -108,7 +108,8 @@ What if all content could be fetched in a single turn? | Batched (1 turn) | (nm)² = n²m² | 1 | | Sequential (m turns) | n²m³/3 | m/3 | -Ten turns ≈ 3× batched cost. Thirty turns ≈ 10×. This ratio scales linearly with turn count, the `m` term. +- Ten turns ≈ 3× batched cost. Thirty turns ≈ 10×. +- This ratio scales linearly with turn count, the `m` term. Many problems inherently require multiple turns. The LLM must reason about intermediate results before knowing what to fetch next. The goal is not to eliminate turns but to minimize them and optimize their structure. @@ -118,7 +119,7 @@ Many problems inherently require multiple turns. The LLM must reason about inter The uniform model above assumes equal token counts per turn. In practice, token distribution across turns is a design choice with significant cost implications. The tokens in the first turns are by far the most costly. -This is roughly similar to credit card debt: early charges compound. If the initial purchase was large, you're in trouble. +This is roughly similar to credit card debt: early charges compound. If the initial purchase was large and months ago, you're in trouble. ### Optimization: multiple fetches per turn @@ -136,14 +137,12 @@ The strict n-9s reliability design model is perfectly aligned with the LLM cost ## Implementation -The release notes graph has two LLM entrypoints, a guidance system built on skills and workflows, and a design shaped by iterative evaluation. This section covers the artifacts, the methodology that produced them, and the patterns that emerged. +The release notes graph has two LLM entrypoints, a guidance system built on skills and workflows, and an overall graph design shaped by iterative evaluation. This section covers the artifacts, the methodology that produced them, and the patterns that emerged. ### Entrypoints -- [`llms.txt`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) — A markdown file (~600 tokens) that contextualizes the graph and routes to skills. Markdown is a natural fit: LLMs readily treat it as instructional content, and it offers native syntax for links, tables, and code fences. -- [`llms.json`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) — A JSON file (~2k tokens) that serves as both entrypoint and data. It embeds enough information to answer common queries directly while offering links into the graph. - -Both enable parallel fetches on the first turn—data plus a relevant skill—applying the "multiple fetches per turn" optimization from the cost model. +- [`llms.txt`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt) — A markdown file (~600 tokens) that contextualizes the graph and routes to the graph and skills. Markdown is a natural fit: LLMs readily treat it as instructional content, and it offers native syntax for links, tables, and code fences. +- [`llms.json`](https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/llms.json) — A JSON file (~2k tokens) that serves both as instructional entrypoint and data. It embeds enough information to answer common queries directly while offering links into the graph. ### Why both? @@ -151,12 +150,14 @@ Both enable parallel fetches on the first turn—data plus a relevant skill—ap Markdown offers natural syntax for links, guidance, and code fences. JSON can carry the same information but lacks markdown's instructional connotations. `llms.json` emerged from experimentation: could JSON achieve the same LLM enablement without a markdown on-ramp? -The answer: mostly yes, with effort. Guidance must be more explicitly signaled: +The answer: yes. Guidance must be more explicitly signaled: +```json { "ai_note": "ALWAYS read required_pre_read first...", "required_pre_read": "https://...SKILL.md" } +``` The imperative "ALWAYS" and the self-describing property name `required_pre_read` compensate for JSON's weaker association with guidance. @@ -184,11 +185,11 @@ Half the link relations in `llms.json` are `latest-*`, reflecting the belief tha Same destination, different semantic intent. This implements the principle: match a key you know with a value you don't. -**Focal lengths**: The core index (`index.json`) is zoomed-out and normalized—all .NET versions over ten years. The LLM index (`llms.json`) is zoomed-in and denormalized—current state with enough data to answer queries directly. The graph applies multiple focal lengths throughout, separating skeletal navigation nodes from weighted content nodes. +**Focal lengths**: The core index (`index.json`) is zoomed-out and normalized—all .NET versions over ten years. The LLM index (`llms.json`) is zoomed-in and denormalized—current supported state with enough data to answer queries directly. The graph applies multiple focal lengths throughout, separating skeletal navigation nodes from weighted content nodes. ### Guidance architecture -Guidance was the hardest part of the graph to develop. It was relatively easy to generate a graph intuitive for LLMs to navigate without guidance. The remaining problem: augmenting that intuition to aid long-tail navigation that tended to underperform. The process of developing this guidance was deeply unintuitive for the graph designer. There are aspects of LLM behavior that do not match human expectation—this needs to be understood. +Guidance was the hardest part of the graph to develop. It was relatively easy to generate a graph intuitive for LLMs to navigate without guidance. The remaining problem: augmenting that intuition to aid long-tail navigation that tended to underperform including hallucination. The process of developing this guidance was unintuitive for the graph designer. There are aspects of LLM behavior that do not match human expectation—this needs to be understood. #### Preamble prompt @@ -247,6 +248,8 @@ All CVE queries use timeline. Fetch workflows.json for navigation paths with `ne 3. Code diffs: `$.commits[key].url` already ends in `.diff`—use as-is. ``` +The skills bias to terse, with observed performance in testing earning expansion. + #### Workflows Workflows extend HAL with a query system. The premise: queries as document data, with HAL relations as query targets. @@ -269,11 +272,11 @@ The `follow_path` property carries most of the expressivity: } ``` -The `kind:llms` prefix anchors the path to a node type, reconnecting the workflow to the graph even though it lives in a separate file. The `next_workflow` property enables chaining, supporting [equivalence classes](https://en.wikipedia.org/wiki/Equivalence_class) and [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself). +The `kind:llms` prefix anchors the path to a node type, reconnecting the workflow to the graph even though it lives in a separate file. The `next_workflow` property enables chaining, supporting [equivalence class](https://en.wikipedia.org/wiki/Equivalence_class) identification and [Don't Repeat Yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) benefits. ### Evaluation -A key principle emerged: _curiosity-driven evaluation_ beats intuition reliance. Once you have a good test harness, it's liberating to not trust your intuition but to test any idea that seems interesting. The distinction between "informed direction" and "bat-crazy idea" drops away. Test both. This stuff is half-magic, half-fever-dream—why should intuition be authoritative? +A key principle emerged: _curiosity-driven evaluation_ > _intuition reliance_. Once you have a good test harness, it's liberating to not trust your intuition but to test any idea that seems interesting. The distinction between "informed direction" and "crazy idea" drops away. Test both. #### Test modes @@ -384,7 +387,18 @@ DOC=$(curl -sf "$URL") echo "$DOC" | jq '.' ``` -This replays the theme from earlier: formats that work for both semantic and syntactic consumers. The graph was validated with `jq` and LLMs; workflows are validated with C# and LLMs. +Running the script: + +```bash +$ ./get-latest-cves.sh | jq ._embedded.disclosures.[].id +Fetching: https://raw.githubusercontent.com/dotnet/core/release-index/release-notes/llms.json +Fetching: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/release-notes/timeline/2025/10/index.json +"CVE-2025-55248" +"CVE-2025-55315" +"CVE-2025-55247" +``` + +This replays the theme from earlier: formats that work for both semantic and syntactic consumers. The graph was validated with `jq` and LLMs; workflows are validated with C# and LLMs (with `jq` offering a supporting role). ## Cost model validation @@ -394,7 +408,7 @@ Final test results: ### Trace: 6-month CVE analysis -A [test using Claude Haiku 3.5](https://github.com/richlander/release-graph-eval-results/blob/main/anthropic_claude-haiku-4.5/B/T6.md) demonstrates the ideal navigation pattern. +A [test using Claude Haiku 3.5](https://github.com/richlander/release-graph-eval-results/blob/main/2025-12-27/anthropic_claude-haiku-4.5/B/T6.md) demonstrates the ideal navigation pattern. **Prompt:** Analyze .NET Runtime and ASP.NET Core CVEs from November 2024 through April 2025. Fetch code diffs and assess whether fixes adequately protect mission-critical apps. Include repo and commit links. @@ -467,7 +481,7 @@ Note: The eval harness truncated `.diff` files to 50 lines to ensure test comple #### Sequential baseline -A sequential approach processes the same 18 documents across 18 turns, one per turn: +A sequential approach would process the same 18 documents across 18 turns, one per turn: | Turn | Document | Tokens | Context | Processed | Attention | Cum. Attention | |------|----------|--------|---------|-----------|-----------|----------------| @@ -517,7 +531,7 @@ The "lean early, heavy late" pattern is load-bearing architecture: 75% of tokens - **Enable multi-fetch patterns.** Expose document collections as lists of links rather than embedded content, encouraging LLMs to batch retrieval. - **Provide explicit workflows.** Graph-resident guidance encodes the designer's knowledge of efficient paths. -These constraints define the design space. We cannot change LLM fundamentals, but we can work within them. To a large degree, reducing turns is similar to loop variable hoisting—old-school performance strategies remain effective. +The cost model constrains the design space. These principles work within it—we cannot change LLM fundamentals, but we can optimize around them. To a large degree, reducing turns is similar to loop variable hoisting—old-school performance strategies remain effective. ## Conclusion From a6394bc5f87fbdb5bf4fe313bac9d4f9e0d72957 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Tue, 6 Jan 2026 17:30:36 -0800 Subject: [PATCH 15/17] Move to 2026 directory --- .../release-notes-graph/exposing-hypermedia-to-llms.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename accepted/{2025 => 2026}/release-notes-graph/exposing-hypermedia-to-llms.md (100%) diff --git a/accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2026/release-notes-graph/exposing-hypermedia-to-llms.md similarity index 100% rename from accepted/2025/release-notes-graph/exposing-hypermedia-to-llms.md rename to accepted/2026/release-notes-graph/exposing-hypermedia-to-llms.md From 25cedf144b2313ba558c2ef10de6394132520140 Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Tue, 6 Jan 2026 17:49:01 -0800 Subject: [PATCH 16/17] Restore linter file --- .markdownlint.json | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/.markdownlint.json b/.markdownlint.json index b274f82fb..a02418cfd 100644 --- a/.markdownlint.json +++ b/.markdownlint.json @@ -1,9 +1,9 @@ { - "default": true, - "MD003": { "style": "atx" }, - "MD004": false, - "MD007": { "indent": 4 }, - "MD013": false, - "MD026": false, - "no-hard-tabs": false -} + "default": true, + "MD003": { "style": "atx" }, + "MD004": false, + "MD007": { "indent": 4 }, + "MD013": { "tables": false, "code_blocks": false }, + "MD026": false, + "no-hard-tabs": false +} \ No newline at end of file From 6f18aa6696f705e8e94234d5b07586570ccc36ce Mon Sep 17 00:00:00 2001 From: Richard Lander Date: Wed, 7 Jan 2026 11:02:25 -0800 Subject: [PATCH 17/17] Add intended user experience --- .../release-notes-graph/exposing-hypermedia-to-llms.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/accepted/2026/release-notes-graph/exposing-hypermedia-to-llms.md b/accepted/2026/release-notes-graph/exposing-hypermedia-to-llms.md index a0ba9231f..a1b7110bd 100644 --- a/accepted/2026/release-notes-graph/exposing-hypermedia-to-llms.md +++ b/accepted/2026/release-notes-graph/exposing-hypermedia-to-llms.md @@ -22,6 +22,14 @@ Vector databases establish relationships via embedding similarity, refined throu Hypermedia advocates for self-describing document structures. HATEOAS contributes the idea that labeled relations across resources enable semantic navigation. This idea can be stretched to being thought of as "no-schema" consumption: readers discover structure through descriptive labels and follow matching paths without requiring domain knowledge upfront. This paradigm is both semantic and structural, equal parts [Retrieval-Augmented Generation (RAG)](https://en.wikipedia.org/wiki/Retrieval-augmented_generation) and [PageRank](https://en.wikipedia.org/wiki/PageRank). It's this mixture that enables direct consumption without the need for vector techniques, enabling low-cost LLM enablement for scenarios where hosting a persistent AI server would be prohibitive. +## User experience + +This is the intended user experience: + +>I last updated .NET 8 in September 2025. Were there any critical CVEs since then? Tell me what they fixed so I can decide if I have an issue. Look at code diffs. +> +>Start here: https://raw.githubusercontent.com/dotnet/core/refs/heads/release-index/llms.txt + ## Approach The primary techniques for LLM applicability are: