-
-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Summary
As noted by @dabrahams in #72 (comment), the output of the CLI does not make it easy to identify the location of the error. Because it only provides a path to the offending item, users must manually traverse the file to find the source of the error.
Proposal
Implement Rust style diagnostic output.
Prototype
I've been experimenting with saphyr (which offers spanned yaml parsing) and annotate_snippets (which provides an ergonomic API for producing Rust-style diagnostic reports), and I've been able to hack together a (very messy) prototype:
diff --git a/src/lib.rs b/src/lib.rs
index 5d4f9d4..38a8bac 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -6,6 +6,7 @@ mod validation_error;
mod validation_state;
use config::{ActionType, RunConfig};
+use saphyr::{LoadableYamlNode, MarkedYamlOwned, SafelyIndex as _, ScalarOwned, YamlDataOwned};
use std::path::PathBuf;
use validation_error::ValidationError;
use validation_state::ValidationState;
@@ -157,6 +158,7 @@ pub mod cli {
fn run(config: &RunConfig) -> ValidationState {
let file_name = config.file_name.unwrap_or("file");
+
let doc = serde_yaml::from_str(config.src);
let mut state = match doc {
@@ -190,6 +192,50 @@ fn run(config: &RunConfig) -> ValidationState {
state.action_type = Some(config.action_type);
state.file_path = config.file_path.map(|file_name| file_name.to_string());
+ let doc = saphyr::MarkedYamlOwned::load_from_str(config.src)
+ .unwrap()
+ .into_iter()
+ .next()
+ .unwrap();
+
+ for error in &state.errors {
+ let mut node = Some(&doc);
+ match error {
+ ValidationError::NoFilesMatchingGlob {
+ code,
+ detail,
+ path,
+ title,
+ } => {
+ for segment in path.split("/").skip(1).collect::<Vec<_>>() {
+ let Some(prev_node) = node else {
+ todo!();
+ };
+ node = prev_node.get(segment);
+ }
+ let node = node.unwrap();
+
+ let report = &[annotate_snippets::Level::ERROR
+ .primary_title(title)
+ .element(
+ annotate_snippets::Snippet::source(config.src)
+ .line_start(26)
+ .path(config.file_path.unwrap_or(file_name))
+ .annotation(
+ annotate_snippets::AnnotationKind::Primary
+ .span(node.span.start.index()..node.span.end.index() - 2)
+ .label(detail.as_ref().unwrap_or(title)),
+ ),
+ )];
+
+ let renderer = annotate_snippets::Renderer::styled()
+ .decor_style(annotate_snippets::renderer::DecorStyle::Unicode);
+ anstream::println!("{}", renderer.render(report));
+ }
+ _ => {}
+ }
+ }
+
state
}Limitations
Unfortunately, valico requires a serde_json::Value as input. The serde_json::Value type is un-spanned, and so cannot produce the required span information for producing the desired diagnostic output. For this reason, the above POC parses the file twice: first with serde_yaml to produce errors via valico, then again with saphyr to retrieve span information, used to render the diagnostic messages.
Perhaps unsurprisingly, I was not able to find any JSON schema validators which accept a parsed yaml type with span information (such as produced by saphyr) as input.
Obviously, double parsing isn't ideal because it...
- Adds a performance overhead
- Unnecessarily increases the file size of the binary
- Increases the surface area for bugs
Possible Solutions
I've considered a few options...
1. Accept Defeat
Either temporarily, or permanently, we could accept the downsides of double parsing the YAML file as an acceptable cost for improving the ergonomics of the tool.
Pros:
- Easy - existing implementation is relatively simple
- No need to incur performance overhead if no errors
Cons:
- As mentioned above
2. Contribute a saphy feature flag to valico
Because both saphy and serde_json essentially describe the same kind of data, it should be possible to implement this in valico e.g. behind a flag.
Pros:
- No double parsing
- No performance overhead
Cons:
- Supporting two separate formats adds complexity for
valico - Requires upstream maintainers blessing or a fork (maintainers of
valicomay not want this feature)
3. Implement conversion from saphy to serde_json::Value
We can convert spahy::MarkedOwnedYaml to serde_json::Value and pass that to valico for validation.
Pros:
- No double parsing so less scope for bugs
- Relatively simple to implement conversion (already did a PoC for this)
Cons:
- Requires cloning the entire AST
- Some performance impact due to conversion
- Results in runtime overhead even if no errors are logged
4. Implement spanned parsing for serde_yaml / serde_json / serde
I looked into whether it was possible to add support for span information to our existing dependencies. Long story short, I'm not aware of any way to practically do this in serde at the moment (though I may have missed something) and even if there was a way, it would be necessary to contribute changes to both serde_yaml and serde_json.
Pros:
- Able to use existing dependencies
- Addresses all of the above mentioned limitations
Cons:
- Likely not feasible
5. Implement a "lingua franca" for data formats
At present, there is no agreed upon format for storing or working with unstructured data. serde_json::Value is used extensively, even in our case for non-JSON formats (we work with yaml). However, serde_json::Value owns its data, and cannot be used in cases where another type is required to own the data.
There are many use cases for working with data which do not depend on the specifics of the underlying type. For example, validation is one such use case. Validation requires only knowledge of the structure of the data that is being represented, not details of the types used to represent it.
A trait could be implemented which acts as a "lingua franca" for working with different data types:
enum Value<'data> {
Null,
String(&'data str),
// ...
}
trait DataFormat<'data> {
type Format: DataFormat<'data>;
fn safe_get<TKey: AsRef<str>>(&self, key: TKey) -> Self::Format;
fn as_value(&self) -> Value<'data>;
// ...
}This can be implemented by any type, including serde_json::Value, serde_yaml::Value, spahy::MarkedOwnedYaml and others, and accepted by validation libraries such as valico.
This approach is inspired by standard-schema which takes a similar approach to solving an unrelated ecosystem fragmentation issue in the TypeScript space.
Pros:
- No double parsing
- No performance overhead
- No increase in binary size
- No hacky glue code required in
action-validator
Cons:
- Outside the scope of this project
- Relies on upstream maintainers accepting this change (at least
valicoor another JSON schema validation crate would need to "buy in" to this approach)
I'm interested to hear thoughts on this and the above issues/solutions. If there's interest for this feature, I'd be happy to work towards this.