feat: Add glyph atlas cache with LRU eviction for vello spare strips renderers by grebmeg · Pull Request #548 · linebender/parley

grebmeg · 2026-02-13T04:25:20Z

Introduces a glyph atlas cache that rasterizes glyphs once and reuses the bitmaps on subsequent frames, reducing redundant work for text-heavy scenes. The cache uses LRU age-based eviction, deterministic hashing for reproducible atlas packing, subpixel-quantized cache keys (supporting variable fonts and COLR glyphs), and a multi-page atlas backed by vello_common::ImageCache.

grebmeg · 2026-02-26T04:59:48Z

Some tests are failing due to minor differences, I skimmed through the Kompari HTML report, and they’re very likely caused by recent changes in Vello that slightly affect the rendered output.

LaurenzV

Still a lot missing, but that's as far as I got today. :( The overall approach looks great to me though!

.github/workflows/ci.yml

examples/common/src/lib.rs

LaurenzV · 2026-02-26T13:40:14Z

examples/common/src/lib.rs

+    }
+
+    /// Print all collected timings, split into one-time and per-frame groups.
+    pub fn print_summary(&self) {


Am I understanding correctly that we have a number of different frames which exercise different paths (different layouts, different caching behavior, etc.), and then we are summarizing the number of all frames across the different stages? If yes, isn't that a bit meaningless since each frame tests something different? So the numbers will just always vary wildly.

The aggregated stats are intentional, they give an amortized view across the full cache lifecycle (cold, warm, partial overlap, etc.), which is useful for comparing overall cached vs uncached performance. Each phase also prints its own per-phase stats. This is a development/benchmarking aid and the mixed summary is helpful for getting a quick read on the overall picture. Happy to revisit the output format later if needed.

LaurenzV · 2026-02-26T14:09:24Z

examples/vello_hybrid_render/src/main.rs

+/// Uses `queue.write_texture` to write transparent pixels to each clear rect,
+/// preventing stale data from evicted glyphs from bleeding through when the
+/// slot is reused on a subsequent frame.
+fn clear_atlas_regions(queue: &wgpu::Queue, renderer: &Renderer, rects: &[PendingClearRect]) {


Wouldn't it be possible to clear all regions in one pass? Is this just for simplicity for now?

Yes, this is just for simplicity for now. The proper approach would be to use vello_hybrid's GPU clear shader to clear all rects in a single render pass, but that requires exposing a batched clear API on vello_hybrid's Renderer. I'll do that as a follow-up in Vello. For now I've at least reused a single zeroed buffer across rects to avoid the per-rect allocation.

parley_draw/src/atlas/cache.rs

taj-p

Looking good! Just flushing my comments before I go into meetings

parley_draw/src/renderers/vello_hybrid.rs

taj-p · 2026-02-26T20:26:29Z

parley_draw/src/atlas/cache.rs

+    /// Take all pending bitmap uploads, leaving the internal queue empty.
+    fn take_pending_uploads(&mut self) -> Vec<PendingBitmapUpload>;
+
+    /// Take all pending atlas command recorders (one per dirty page),
+    /// leaving the internal collection empty.
+    fn take_pending_atlas_commands(&mut self) -> Vec<AtlasCommandRecorder>;
+
+    /// Take all pending clear rects, leaving the internal queue empty.
+    ///
+    /// Each rect describes an atlas region that was freed during
+    /// [`maintain`](GlyphCache::maintain) and must be zeroed to transparent.
+    /// Drain these **after** calling `maintain`.
+    fn take_pending_clear_rects(&mut self) -> Vec<PendingClearRect>;


take_pending_atlas_commands uses mem::take, which transfers ownership of the SmallVec and every inner Vec<AtlasCommand> to the caller. After replay the caller drops everything, so the heap allocations are lost every frame and re-allocated from scratch next frame.

Since consumers only need to iterate and drain the commands, we could avoid this by keeping the recorders in place and letting consumers borrow them. Something like:

pub fn replay_pending_atlas_commands( &mut self, mut f: impl FnMut(&mut AtlasCommandRecorder), ) { for slot in &mut self.pending_atlas_commands { if let Some(recorder) = slot.as_mut() { if !recorder.commands.is_empty() { f(recorder); recorder.commands.clear(); } } } }

Consumers change from:

for mut recorder in glyph_caches.glyph_atlas.take_pending_atlas_commands() { glyph_renderer.reset(); replay_atlas_commands(&mut recorder.commands, glyph_renderer); // ... }

to

glyph_caches.glyph_atlas.replay_pending_atlas_commands(|recorder| { glyph_renderer.reset(); replay_atlas_commands(&recorder.commands, glyph_renderer); // ... });

The benefit of this approach is that the command list for each AtlasCommandRecorder isn't freed each frame. I think that's beneficial for operations like zoom, where we need to insert into the cache every frame (since resolved font size will presumably change each frame).

Good idea! Switched to a replay_pending_atlas_commands callback API that keeps recorders in place and only clears their command Vecs after replay, preserving capacity across frames. For the CPU backend, added a replay_pending_atlas_commands_with_pixmaps helper that splits the borrow to give the closure access to both the recorder and the page pixmaps.

parley_draw/src/renderers/vello_hybrid.rs

parley_draw/src/atlas/key.rs

parley_draw/src/glyph.rs

taj-p · 2026-02-27T00:27:56Z

parley_draw/src/glyph.rs

 /// and then render that into the actual scene, in a similar fashion to
 /// bitmap glyphs.
-pub struct ColorGlyph<'a> {
+pub struct ColrGlyph<'a> {


Since we renamed the other similar structs to use the a "Glyph" prefix, should we do the same here?

Oh yeah, of course! Renamed ColrGlyph → GlyphColr for consistency with the other structs.

taj-p · 2026-02-27T01:35:54Z

parley_draw/src/glyph.rs

+        &mut self,
+        glyph: PreparedGlyph<'_>,
+        glyph_atlas: &mut C,
+        image_cache: &mut ImageCache,


Is the plan to (sometime in the future) move these vello_common dependencies (and the shelf allocator) into a separate crate so that Glifo doesn't depend on vello_common?

Yes, the long-term plan is to extract shared types (color, basic geometry) and the allocator into their own crate(s) so glifo doesn't depend on vello_common directly. For now the coupling is pragmatic: those types are stable and we'd just be re-exporting the same things. Happy to track this as a follow-up issue if that would be helpful?

Happy to track this as a follow-up issue if that would be helpful?
However you like 🙏 !

parley_draw/src/atlas/key.rs

parley_draw/src/atlas/cache.rs

taj-p · 2026-02-27T07:46:18Z

parley_draw/src/atlas/cache.rs

+
+/// Configuration for glyph cache eviction behavior.
+#[derive(Clone, Debug)]
+pub struct GlyphCacheConfig {


Is it possible to configure minimum font size (after which we use renderer sampling to interpolate smaller sizes) and a maximum size (after which we always directly render)?

Added max_cached_font_size to GlyphCacheConfig (default 128 ppem) — glyphs above this threshold now bypass the atlas and render directly each frame.

For min_cached_font_size: this would involve caching at the clamped size and applying a downscale transform at composite time, plus adjustments to hinting, metrics, and subpixel offsets across all three glyph branches. I think it's a good idea but a bit too involved for this PR. I'll track it as a follow-up. Happy to hear what you think!

I think it's a good idea but a bit too involved for this PR. I'll track it as a follow-up. Happy to hear what you think!

SGTM!

parley_draw/src/renderers/vello_renderer.rs

taj-p · 2026-02-27T08:16:29Z

parley_draw/src/atlas/cache.rs

+        image_cache: &mut ImageCache,
+        key: GlyphCacheKey,
+        raster_metrics: RasterMetrics,
+    ) -> Option<(u16, u16, AtlasSlot, &mut AtlasCommandRecorder)>;


It's unfortunate that this form of API necessitates a redundant hash map lookup on cache miss. I.e., on cache miss, we perform another hash map operation to insert (being unable to re-use the RawEntry or similar).

We could precompute the hash once and use something like HashMap::raw_entry_mut().from_key_hashed_nocheck(...) but let's maybe leave that as a follow up

Agreed, the double lookup on miss is a known trade-off of the current get/insert split. I'll keep raw_entry_mut with precomputed hashes in mind as a follow-up optimisation. Thanks for flagging it.

parley_draw/src/atlas/cache.rs

LaurenzV · 2026-02-27T07:09:24Z

parley_draw/src/renderers/vello_renderer.rs

+            let cache_key = prepared_glyph.cache_key;
+            let transform = prepared_glyph.transform;
+            let tint_color = renderer.get_context_color();
+
+            if let Some(ref key) = cache_key {
+                if let CacheResult::CachedAndRendered = render_outline_via_cache::<B>(
+                    renderer,
+                    &glyph.path,
+                    transform,
+                    key,
+                    glyph_atlas,
+                    image_cache,
+                    tint_color,
+                ) {
+                    return;
+                }


Isn't this the same code as for fill_glyphs? Where is the stroking happening here? 🤔

Good catch, thanks! This is actually already like that on main — I just didn't notice. Split render_outline_directly into fill_outline_directly and stroke_outline_directly, with the stroke variant now correctly calling stroke_path.

LaurenzV · 2026-02-27T07:19:19Z

parley_draw/src/glyph.rs

+            // Sub-pixel x offset is quantized into the cache key so that glyphs
+            // at different fractional positions get distinct cached bitmaps.
+            let cache_key = self.atlas_cache_enabled.then(|| {
+                let fractional_x = transform.translation().x.fract() as f32;


Now that we encode the fractional translation in the subpixel offset, do we not need to update the transform to discard the fraction from there? Also, why are y fractions not handled here?

I think the fractional removal is handled in render_outline_glyph_from_atlas

Good question! The fractional removal at render time is handled by render_outline_glyph_from_atlas which uses tx.floor()/ty.floor() — so the full transform is passed through but only the integer part is used for placement.

For y fractions: when hinting is enabled, the y translation is already rounded in calculate_outline_transform. When hinting is off, if I'm not missing something, vertical sub-pixel shifts are less perceptually impactful for horizontal text, and adding a y dimension would multiply cache entries. Added a comment to clarify this.

parley_draw/src/renderers/vello_renderer.rs

parley_draw/src/renderers/vello_cpu.rs

LaurenzV · 2026-02-27T07:45:37Z

parley_draw/src/renderers/vello_cpu.rs

+        // shifted to the glyph's slot within the page. Contrast with the hybrid
+        // backend, which gets per-allocation origins from the image cache.


Not sure I understand this, I thought vello_cpu and vello_hybrid work the same, where we have one (or multiple) huge atlas pages and glyphs are drawn into that at specific offsets. Why is there different behavior here?

Both backends do use atlas pages with glyphs at specific offsets. The difference is in how the paint transform is set up when reading back from the atlas:

vello_cpu: The ImageSource references the entire page pixmap, so we need to shift the paint origin to the glyph's (x, y) position within the page.

vello_hybrid: Vello's image cache resolves each allocation to its own origin, so we only need to compensate for the GLYPH_PADDING inset.

The underlying atlas structure is the same — it's just the coordinate system for the paint source that differs between the two rendering paths.

LaurenzV · 2026-02-27T07:48:45Z

Cargo.toml

+vello_hybrid = { git = "https://github.com/linebender/vello.git", branch = "gemberg/glyph-cache-rect-allocator" }
+
+[profile.profiling]
+inherits = "release"


strip = "none" might also be worth adding!

Added strip = "none" to make sure symbols are preserved for profiling.

parley_draw/src/renderers/vello_hybrid.rs

LaurenzV · 2026-02-27T07:54:02Z

parley_draw/src/renderers/vello_hybrid.rs

+        let padding = GLYPH_PADDING as f64;
+        Affine::translate((-padding, -padding))


Do we not need glyph padding for vello_cpu?

The vello_cpu doesn't need an explicit GLYPH_PADDING offset in paint_transform because atlas_slot.x / atlas_slot.y are already inset by GLYPH_PADDING from the allocation origin (see GlyphAtlas::insert_entry). So the translate to (-slot.x, -slot.y) already moves past the padding. The hybrid backend's image cache resolves to the allocation origin (before padding), so it needs the explicit (-padding, -padding) shift.

LaurenzV · 2026-02-27T08:01:24Z

parley_draw/src/glyph.rs

+    /// A COLR glyph cached in the atlas.
+    /// The `Rect` parameter contains the fractional area dimensions
+    /// to preserve sub-pixel accuracy during rendering.
+    Colr(Rect),


Why do Colr and Outline have different behavior here? Because we don't use subpixel offsets here, since they are not as critical for emojis?

Colr glyphs use fractional scaled_bbox dimensions for their area (computed from the glyph's actual bounding box), while outline glyphs use integer atlas_slot.width/height since they snap to pixel boundaries. The Rect carries these fractional bounds so the rendered quad matches the original glyph dimensions rather than the padded integer atlas allocation, which avoids subtle scaling artifacts.

taj-p · 2026-02-27T08:24:45Z

parley_draw/src/atlas/cache.rs

+    /// Outline and COLR glyph commands awaiting replay, indexed by atlas page.
+    /// Uses `SmallVec` with inline capacity of 1 because most applications use
+    /// a single atlas page; the common case avoids heap allocation entirely.
+    pending_atlas_commands: SmallVec<[Option<AtlasCommandRecorder>; 1]>,


Is there a way to configure a maximum value for this?

Yes, the underlying AtlasConfig (from vello_common) already has a max_atlases field that caps the number of atlas pages. When the limit is reached, allocate() fails and the glyph is drawn directly without caching. This is configured on the ImageCache side rather than GlyphCacheConfig.

taj-p

Nice! This looks really good! Some comments

parley_draw/src/atlas/commands.rs

parley_draw/src/colr.rs

taj-p · 2026-02-27T18:54:27Z

parley_draw/src/glyph.rs

+                    hinted: false,
+                    subpixel_x: 0,
+                    context_color: BLACK,
+                    var_coords: SmallVec::new(),


Depending on var_coords for the run, this could allocate every time. I think we either should consider:

removing var_coords from the key and rely on the 2 level map structure (which encodes the var_coords anyway); or

we should use the Equivalent trait to create borrowed keys (shown below) and convert it to an owned key on cache misses only - see parley/src/lru_cache.rs for an example implementation

#[derive(Clone, Copy)] pub struct GlyphCacheKeyRef<'a> { pub font_id: u64, pub font_index: u32, pub glyph_id: u32, pub size_bits: u32, pub hinted: bool, pub subpixel_x: u8, pub context_color_packed: u32, pub var_coords: &'a [NormalizedCoord], }

parley_draw/src/renderers/vello_cpu.rs

parley_draw/src/renderers/vello_hybrid.rs

LaurenzV

I only took a very coarse second look because it would take ages to review again carefully, but I don't think I would have much to add in addition to my first review. So LGTM if LGTM to @taj-p 😄 Awesome work!

fontique/src/backend/coretext.rs

parley_draw/src/atlas/cache.rs

taj-p

LGTM 🎉 ! Let's goooo!!! I didn't re-read all the code line-by-line again, but skimmed through the comments and their resolutions. All seems absolutely BRILLIANT!

…renderers

grebmeg force-pushed the gemberg/glyph-cache branch 2 times, most recently from 61beb5f to c299b2b Compare February 25, 2026 06:07

grebmeg changed the title ~~[WIP] feat: glyph cache~~ feat: Add glyph atlas cache with LRU eviction for vello spare strips renderers Feb 25, 2026

grebmeg force-pushed the gemberg/glyph-cache branch from e7a0634 to 96e3aff Compare February 26, 2026 04:48

grebmeg marked this pull request as ready for review February 26, 2026 05:11

grebmeg requested review from LaurenzV, conor-93 and taj-p February 26, 2026 05:11

LaurenzV reviewed Feb 26, 2026

View reviewed changes

taj-p reviewed Feb 26, 2026

View reviewed changes

taj-p reviewed Feb 27, 2026

View reviewed changes

parley_draw/src/glyph.rs Show resolved Hide resolved

taj-p reviewed Feb 27, 2026

View reviewed changes

LaurenzV reviewed Feb 27, 2026

View reviewed changes

taj-p reviewed Feb 27, 2026

View reviewed changes

grebmeg force-pushed the gemberg/glyph-cache branch from 96e3aff to d560c82 Compare March 2, 2026 10:05

grebmeg requested review from LaurenzV and taj-p March 2, 2026 10:31

LaurenzV reviewed Mar 2, 2026

View reviewed changes

fontique/src/backend/coretext.rs Outdated Show resolved Hide resolved

parley_draw/src/atlas/cache.rs Outdated Show resolved Hide resolved

taj-p approved these changes Mar 3, 2026

View reviewed changes

grebmeg added 4 commits March 3, 2026 16:10

feat: Add glyph atlas cache with LRU eviction for vello spare strips …

2bdc847

…renderers

.

0c1e41e

.

fe3ca5a

.

12cc565

grebmeg force-pushed the gemberg/glyph-cache branch from 6c5bc19 to 12cc565 Compare March 3, 2026 05:10

grebmeg added 2 commits March 3, 2026 16:32

update snapshots

59c2a8d

disable draw_bitmap_emoji test

cd88e1b

grebmeg added this pull request to the merge queue Mar 3, 2026

Merged via the queue into main with commit b384946 Mar 3, 2026
24 checks passed

grebmeg deleted the gemberg/glyph-cache branch March 3, 2026 06:31

		// shifted to the glyph's slot within the page. Contrast with the hybrid
		// backend, which gets per-allocation origins from the image cache.

		let padding = GLYPH_PADDING as f64;
		Affine::translate((-padding, -padding))

Conversation

grebmeg commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grebmeg commented Feb 26, 2026

Uh oh!

LaurenzV left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grebmeg commented Feb 13, 2026 •

edited

Loading