Persist article storage metrics during ingestion#44
Merged
Conversation
Track the number of stored files and total storage size in bytes for each article during ingestion. These fields are computed from the actual content saved (HTML, images, thumbnails, metadata) and persisted in the article.json metadata for all ingestion types (HTML, PDF, image). https://claude.ai/code/session_011h8i7BVMqrpsxkER9yKVSF
Display article size and file count from persisted fields when available, skipping the expensive calculateArticleStorageSize call. Legacy articles without these fields fall back to the on-demand calculation. https://claude.ai/code/session_011h8i7BVMqrpsxkER9yKVSF
✅ Deploy Preview for savrlist ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for savrdev ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Verify that ingestHtml sets assetCount (4 base files for no-image articles) and sizeBytes (positive, larger than raw HTML) on the returned article, and that these fields are persisted in the saved article.json. https://claude.ai/code/session_011h8i7BVMqrpsxkER9yKVSF
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes storage size calculation by persisting asset count and total size information during article ingestion, eliminating the need for expensive on-demand calculations in most cases.
Key Changes
assetCountandsizeBytesto track persisted storage informationingestHtml(),ingestPdf(), andingestImage()to calculate and persist asset counts and total sizes during the ingestion processdownloadAndResizeImages()to return detailed metrics (successful downloads, saved bytes, file count) instead of just a countcalculateArticleStorageSize()calls when persisted metrics are already availableImplementation Details
https://claude.ai/code/session_011h8i7BVMqrpsxkER9yKVSF