libdeflate is expected to be significantly faster than flate2
This will require a refactor internally so that the decoder can compute the output array size. The output array size should be able to be computed from tile height, width, bits per sample and number of samples.
cc @weiji14 , what compressions did you test when you tested async-tiff as slower than gdal/libtiff?