Conversation
With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK. |
|
A couple thoughts... I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both a The addition of an |
It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.
What do you by this? The main problem in I'm also not suggesting that calling |
|
@fintelia This now includes the other changes including to
|
|
I can't speak about image metadata, but I really don't like the new
Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is, However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects from And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the best For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently. So the API could look like this: trait ImageDecoder {
// ...
/// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently.
/// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.
fn preferred_viewbox(&self, viewbox: Rect) -> Rect {
viewbox // default impl
}
fn read_image_rect(&mut self, buf, viewbox) -> ImageResult {
Err(ImageError::Decoding(Decoding::RectDecodingNotSupported)) // or similar
}This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations. |
86c9194 to
cdc0363
Compare
That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed to
It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation. |
| } | ||
| } | ||
|
|
||
| impl ImageReader<'_> { |
There was a problem hiding this comment.
nit: it is easier for scrolling through the file if the impl blocks for each struct immediately follow the definitions
src/io/image_reader_type.rs
Outdated
| self.viewbox = Some(viewbox); | ||
| } | ||
|
|
||
| /// Get the previously decoded EXIF metadata if any. |
There was a problem hiding this comment.
The "previously decoded" part here makes me a bit nervous. I think we'll want to be clearer about what the user has to do to make sure they don't get None for images that actually do have EXIF data
There was a problem hiding this comment.
This is a bit odd and will depend on the format and metadatum. For instance, XMP is encoded per-image in tiff but only once in gif (despite us representing this as an image sequence) and also only once in png (no word about APNG). Problematically, in gif and png the standard requires absolutely no ordering with any of the other chunks. So it might be encountered before all of the header information is done; or after all the images have been consumed.
The problem with back references is of course the unclear association. And when multiple are included we always have a problem with consuming them 'after the end' since it should need to be buffered or the decoder able to store seek-back points (like png). Encoding all that in the interface is a challenge, i will incur some unavoidable complexity.
There was a problem hiding this comment.
Moving the metadata query between peek_layout and read_image doesn't really affect this argument with the variants of MetadataHint that I've found to be necessary. So that is cleaner, see #2672 (comment)
Of course we have last_attributes still remaining since that is information that combines the information the reader has retrieved, e.g. the orientation given directly from read_image together with the fallback of querying it from exif.
There was a problem hiding this comment.
Alright, I finally had an idea for resolving this. There is a separation of concerns here. Users that just want pixel data (i.e. most) will reach for decode() -> Result<DynamicImage, _> and that's fine but encumbers the return type. But advanced users will want performance, too, so let's give them an _into method with a potentially pre-allocated buffer (optimization TBD). That frees the return type for an adapter type that gives provides an accessor for last_attributes that is strongly tied to the image it actual relates to. That also resolves some concerns for metadata (PerImage vs. InHeader in particular).
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
1a114c3 to
306c6d2
Compare
|
Resolving the naming question as |
e8d2713 to
4325060
Compare
|
@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough? Is the idea of returning data from As an aside, in wondermagick we basically find that sequence encoding is a missing API to match imagemagick. We can currently only do this with |
f6720de to
c677c88
Compare
The latter interface will be something that interfaces and interacts with the underlying decoder. Introduces ImageDecoder::set_viewport as a stand-in for what is intended here: color, subsampling, gain map application will take more interaction, clarification of how the library will request their data, and adjustments to the layout definition by the decoder.
The purpose of the trait is to be careful with the boundary to the `moxcms` dependency. We use it because it is a quality implementation but it is heavy weight for what we need. There's other possible ways to provide transfer functions and color space transforms. Now this also introduces ICC profile parsing but again that could be done with a *much* lighter dependency as we only need basic information from it. The trait should make every little additional cross-dependency a conscious decision. Also it should be the start of a customization point, by feature flag or actually at runtime.
No longer responsible for ensuring the size constraints are met under the new policy and with available of constructing a reader from an instance of a boxed decoder.
This allows us to write a generic iterator which uses the same decoder function to generate a whole sequence of frames. The attributes are designed to be extensible to describe changes in available metadata as well, very concretely some formats require that XMP/ICC/… are polled for each individual image whereas others have one for the whole file and put that at the end. So there's no universal sequence for querying the metadata, and we need to hold runtime information. This will be the focus of the next commit.
This commit is meant to be revertable. The demonstration code was an experiment at improving the protocol between the trait implementation and the user facing `ImageReader`. While successfully showing that the interface would work with extension points as intended the details are not fleshed out and we would like at least some implementations to provide actual `viewbox` internals to add this feature. Also remember: this is not yet tested, `ImageReader::peek_layout` and all its derived methods should very likely consult the viewbox setting.
This method was mostly implemented to acquire EXIF information and retrieve the entry from it. However, that is a weird way of using the trait system with this mutable method. Instead we move the responsibility of extracting and manipulation of EXIF to the ImageReader and have the decoder interface only support its input. The orientation is added as a field to per-image decoded data in case where such information exists outside of EXIF (e.g. in TIFF).
At least the per-image metadata should be retrieved there. This also ensures that there is a consistent location for data available after the header for all images and data that changes per image. The call to `read_image` can then be destructive for all metadata that is only valid for that particular frame (e.g. TIFF) which it needed to delay until the next `peek_layout` before, leading to very odd interactions. Previously, it meant that `read_image` was supposed to cleanup the image data itself but not the metadata. This split responsibility is not super intuitive for many image types..
Having separate methods for these is not very effective in terms of protocol. All the formats we have can determine them at the same sequence point and `peek_layout` has become somewhat central in the definition of the protocol. It defines the buffer requirements for `read_image`—having one method means one source of truth for this length requirement which is far clearer than requiring that the caller run multiple methods and combine their results (with unclear semantics of running intermediate other methods).
5fb3ba9 to
de2b5fd
Compare
This makes the pain point mentioned in a review comment more apparent for us, non_exhaustive is harder to construct but we may need the extensibility for per-frame control over the decoding process. (Two likely examples: fields that indicate existing HDR gain maps and information about the color conversion chain to undo in encoding).
de2b5fd to
6b4cedd
Compare
5f7790f to
534b11c
Compare
|
Sorry for all the force-push/rebase confusion. I think I accidentally bisected during a rebase and somehow got two commits squashed in the process of rebasing on the diverged main branch while adding my own work (and still wanting good CI for the pure rebase). |
Calling it DecodedImageAttributes so it has the right prefix binding it to the decoder but clarifies it is not the image data itself. It lives in the metadata module as you're mostly expected to use it by the construction from ImageReader and not directly as a type.
534b11c to
10e395b
Compare
Motivated by attempting integration with wondermagick. This is part of the metadata group available after decoding and does, by definition, not influence the layout. This placement also makes it impossible to be interpreted that way. In the future the decoder may return a chain of transformations that it undertook, this being (part of) the base state. This whole chain would obviously only be available afterwards.
|
@Shnatsel Sketch for the integration with wondermagick is here. Unfortunately does not compile yet since the integration crates depend on the crates.io version and not the git version—so they don't automatically work. |
|
Looking at the wondermagick sketch, why are we creating a luma8 image? That looks really odd: let mut pixels = DynamicImage::new_luma8(0, 0);If this is a way to create a blank placeholder |
src/io/image_reader_type.rs
Outdated
| pub fn decode(&mut self) -> ImageResult<DynamicImage> { | ||
| let mut empty = DynamicImage::new_luma8(0, 0); | ||
| self.decode_into(&mut empty)?; | ||
| Ok(empty) | ||
| } | ||
|
|
||
| /// Decode an image into a provided buffer and retrieve metadata. | ||
| pub fn decode_into( | ||
| &mut self, | ||
| image: &mut DynamicImage, | ||
| ) -> ImageResult<DecodedImageMetadata<'_>> { |
There was a problem hiding this comment.
How do you mean convenience method? Currently you have:
let mut buffer = DynamicImage::new_luma8(0, 0);
let meta = reader.decode_into(&mut buffer)?;
meta.attributes()We need two separate return values to also construct the buffer. I think a tuple return is more confusing. Do you mean a convenience to construct only the buffer based on peeking the layout?
let mut buffer = reader.buffer_from_peek()?;
let meta = reader.decode_into(&mut buffer)?;
meta.attributes()That would probably work but currently the implementation is very bad at reusing buffers. I don't feel comfortable having that API but allocating twice regardless.
There was a problem hiding this comment.
It is very similar Read::read_to_end, except you have a few empty constructors rather than one. You're supposed to (be able to) recycle the buffer here and usage should be fixed upstream to add a buffer argument to decode but that was more complex than a simple sketch. (imagemagick has subsequences and let's you reset an element in a sequence by re-reading it; so we do not read images only at the start, and we discard some images).
There was a problem hiding this comment.
Something along the lines of let (pixels, meta) = reader.decode(); would indeed be nice.
Really anything higher-level that doesn't require constructing an invalid (0,0) image with an irrelevant pixel format, luma8 or otherwise. The current API is probably fine as an advanced interface, I just think there should be a high-level one-liner.
There was a problem hiding this comment.
What about having a tuple return on ImageReader::decode() but suppressing the metadata on ImageReaderOptions::decode (and by extension load_from_memory). Not entirely fitting the idea of a simple interface but that would be a gradient of complexity at least. I'm still not entirely comfortable with the inconsistency there but maybe it's fine—you're constructing the ImageReader instance for more details after all.
There was a problem hiding this comment.
I see buffer reuse as an advanced and rare thing that relies on the assumption you're decoding images of the same size over and over. I understand why decode_into() is needed as image gets used more and more for FFI, including GNOME and GStreamer, but I'd much rather provide high-level methods that do not require getting into that much detail from typical Rust code.
If the question is "should the high-level API decode metadata?", the answer is "I'd like to have both options", with e.g. decode_pixels() discarding metadata and decode() returning both in say a tuple, possibly having called meta.attributes() internally too.
See #2245, the intended
ImageDecoderchanges.This changes the
ImageDecodertrait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards theimagelibrary. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use ofImageReaderand other inherent methods instead.This is a work-in-progress, below motivates the changes and discusses open points.
ImageDecoder::peek_layoutencourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:ImageDecoder::read_image(&mut self)no longer consumesself. We no longer need the additionalboxedmethod and its trait work around, the trait is now dyn-compatible.Discussion
initpeek_layoutshould return the full layout information in a single struct. We have a similar open issue forpngin its own crate, and the related work fortiffis in the pipeline where itsBufferLayoutPreferencealready exists to be extended with said information.Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by thesee: Replaceimageside.ImageDecoder::set_limitswithImageDecoder::set_allocation_limit#2709, Add an atomically shared allocation limit #27081.1, but it's not highly critical.read_imagethen switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol.ImageReaderwith a new interface to return some of it. That may be better suited for a separate PR though.CicpRgband apply it to a decodedDynamicImage.Cleanup
peek_layoutmore consistently afterread_imageread_imageis 'destructive' in all decoders, i.e. re-reading an image and reading an image beforeinitshould never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid for one read.