Fancy errors?

## Summary

As noted by @dabrahams in https://github.com/mpalmer/action-validator/issues/72#issuecomment-2086780223, the output of the CLI does not make it easy to identify the location of the error. Because it only provides a path to the offending item, users must manually traverse the file to find the source of the error.

## Proposal

Implement Rust style diagnostic output.

## Prototype

I've been experimenting with [saphyr](https://github.com/saphyr-rs/saphyr) (which offers spanned `yaml` parsing) and [annotate_snippets](https://github.com/rust-lang/annotate-snippets-rs) (which provides an ergonomic API for producing Rust-style diagnostic reports), and I've been able to hack together a (very messy) prototype:

<img width="1369" height="135" alt="Image" src="https://github.com/user-attachments/assets/90dca202-5e01-48f7-8bad-c15036d8c374" />

```patch
diff --git a/src/lib.rs b/src/lib.rs
index 5d4f9d4..38a8bac 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -6,6 +6,7 @@ mod validation_error;
 mod validation_state;
 
 use config::{ActionType, RunConfig};
+use saphyr::{LoadableYamlNode, MarkedYamlOwned, SafelyIndex as _, ScalarOwned, YamlDataOwned};
 use std::path::PathBuf;
 use validation_error::ValidationError;
 use validation_state::ValidationState;
@@ -157,6 +158,7 @@ pub mod cli {
 
 fn run(config: &RunConfig) -> ValidationState {
     let file_name = config.file_name.unwrap_or("file");
+
     let doc = serde_yaml::from_str(config.src);
 
     let mut state = match doc {
@@ -190,6 +192,50 @@ fn run(config: &RunConfig) -> ValidationState {
     state.action_type = Some(config.action_type);
     state.file_path = config.file_path.map(|file_name| file_name.to_string());
 
+    let doc = saphyr::MarkedYamlOwned::load_from_str(config.src)
+        .unwrap()
+        .into_iter()
+        .next()
+        .unwrap();
+
+    for error in &state.errors {
+        let mut node = Some(&doc);
+        match error {
+            ValidationError::NoFilesMatchingGlob {
+                code,
+                detail,
+                path,
+                title,
+            } => {
+                for segment in path.split("/").skip(1).collect::<Vec<_>>() {
+                    let Some(prev_node) = node else {
+                        todo!();
+                    };
+                    node = prev_node.get(segment);
+                }
+                let node = node.unwrap();
+
+                let report = &[annotate_snippets::Level::ERROR
+                    .primary_title(title)
+                    .element(
+                        annotate_snippets::Snippet::source(config.src)
+                            .line_start(26)
+                            .path(config.file_path.unwrap_or(file_name))
+                            .annotation(
+                                annotate_snippets::AnnotationKind::Primary
+                                    .span(node.span.start.index()..node.span.end.index() - 2)
+                                    .label(detail.as_ref().unwrap_or(title)),
+                            ),
+                    )];
+
+                let renderer = annotate_snippets::Renderer::styled()
+                    .decor_style(annotate_snippets::renderer::DecorStyle::Unicode);
+                anstream::println!("{}", renderer.render(report));
+            }
+            _ => {}
+        }
+    }
+
     state
 }
```

## Limitations

Unfortunately, [valico](https://github.com/s-panferov/valico) requires a `serde_json::Value` as input. The `serde_json::Value` type is un-spanned, and so cannot produce the required span information for producing the desired diagnostic output. For this reason, the above POC parses the file twice: first with `serde_yaml` to produce errors via `valico`, then again with `saphyr` to retrieve span information, used to render the diagnostic messages.

Perhaps unsurprisingly, I was not able to find any JSON schema validators which accept a parsed `yaml` type with span information (such as produced by `saphyr`) as input.

Obviously, double parsing isn't ideal because it...

- Adds a performance overhead
- Unnecessarily increases the file size of the binary
- Increases the surface area for bugs

## Possible Solutions

I've considered a few options...

### 1. Accept Defeat

Either temporarily, or permanently, we could accept the downsides of double parsing the YAML file as an acceptable cost for improving the ergonomics of the tool.

Pros:
- Easy - existing implementation is relatively simple
- No need to incur performance overhead if no errors

Cons:
- As mentioned above

### 2. Contribute a `saphy` feature flag to `valico`

Because both `saphy` and `serde_json` essentially describe the same kind of data, it should be possible to implement this in `valico` e.g. behind a flag.

Pros:
- No double parsing
- No performance overhead

Cons:
- Supporting two separate formats adds complexity for `valico`
- Requires upstream maintainers blessing or a fork (maintainers of `valico` may not want this feature)

### 3. Implement conversion from `saphy` to `serde_json::Value`

We can convert `spahy::MarkedOwnedYaml` to  `serde_json::Value` and pass that to `valico` for validation.

Pros:
- No double parsing so less scope for bugs
- Relatively simple to implement conversion (already did a PoC for this)

Cons:
- Requires cloning the entire AST
- Some performance impact due to conversion
- Results in runtime overhead even if no errors are logged

### 4. Implement spanned parsing for `serde_yaml` / `serde_json` / `serde`

I looked into whether it was possible to add support for span information to our existing dependencies. Long story short, I'm not aware of any way to practically do this in `serde` at the moment (though I may have missed something) and even if there was a way, it would be necessary to contribute changes to both `serde_yaml` and `serde_json`.

Pros:
- Able to use existing dependencies
- Addresses all of the above mentioned limitations

Cons:
- Likely not feasible

### 5. Implement a "lingua franca" for data formats

At present, there is no agreed upon format for storing or working with unstructured data. `serde_json::Value` is used extensively, even in our case for non-JSON formats (we work with `yaml`). However, `serde_json::Value` owns its data, and cannot be used in cases where another type is required to own the data.

There are many use cases for working with data which do not depend on the specifics of the underlying type. For example, validation is one such use case. Validation requires only knowledge of the structure of the data that is being represented, not details of the types used to represent it.

A trait could be implemented which acts as a "lingua franca" for working with different data types:

```rs
enum Value<'data> {
    Null,
    String(&'data str),
    // ...
}

trait DataFormat<'data> {
    type Format: DataFormat<'data>;

    fn safe_get<TKey: AsRef<str>>(&self, key: TKey) -> Self::Format;

    fn as_value(&self) -> Value<'data>;

    // ...
}
```

This can be implemented by any type, including `serde_json::Value`, `serde_yaml::Value`, `spahy::MarkedOwnedYaml` and others, and accepted by validation libraries such as `valico`.

This approach is inspired by [standard-schema](https://github.com/standard-schema/standard-schema) which takes a similar approach to solving an unrelated ecosystem fragmentation issue in the TypeScript space.

Pros:
- No double parsing
- No performance overhead
- No increase in binary size
- No hacky glue code required in `action-validator`

Cons:
- Outside the scope of this project
- Relies on upstream maintainers accepting this change (at least `valico` or another JSON schema validation crate would need to "buy in" to this approach)

---

I'm interested to hear thoughts on this and the above issues/solutions. If there's interest for this feature, I'd be happy to work towards this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fancy errors? #102

Summary

Proposal

Prototype

Limitations

Possible Solutions

1. Accept Defeat

2. Contribute a `saphy` feature flag to `valico`

3. Implement conversion from `saphy` to `serde_json::Value`

4. Implement spanned parsing for `serde_yaml` / `serde_json` / `serde`

5. Implement a "lingua franca" for data formats

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Fancy errors? #102

Description

Summary

Proposal

Prototype

Limitations

Possible Solutions

1. Accept Defeat

2. Contribute a saphy feature flag to valico

3. Implement conversion from saphy to serde_json::Value

4. Implement spanned parsing for serde_yaml / serde_json / serde

5. Implement a "lingua franca" for data formats

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

2. Contribute a `saphy` feature flag to `valico`

3. Implement conversion from `saphy` to `serde_json::Value`

4. Implement spanned parsing for `serde_yaml` / `serde_json` / `serde`