Skip to content

Conversation

@userhaptop
Copy link

@userhaptop userhaptop commented Feb 7, 2026

[Summary]

This PR enhances the Qlean image module with multi-distribution support, flexible custom image handling, and optimized streaming hash computation.

[Key changes]

Multi-distro support: Added Ubuntu, Fedora, and Arch image workflows via a generic ImageAction abstraction, including image download and kernel/initrd extraction.

Custom image support: Users can provide images via URL or local path with mandatory SHA256/SHA512 verification, plus optional pre-extracted boot files.

Streaming hash computation: Implemented single-pass hash calculation during file download/processing, achieving performance comparable to or better than standard shell tools.

[Validation]

All unit and integration tests pass (correctness, error handling, serialization, edge cases).

Benchmarks show streaming hash is ~5–30% faster than shell-based hashing while producing identical results.

Signed-off-by: userhaptop <1307305157@qq.com>
Signed-off-by: userhaptop <1307305157@qq.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial Ubuntu cloud-image support to qlean’s image pipeline by introducing a new distro variant and a download/extract strategy that avoids libguestfs extraction tools by consuming Ubuntu’s pre-unpacked boot artifacts.

Changes:

  • Add Ubuntu to the Distro enum and extend Image / create_image to support it.
  • Implement an Ubuntu ImageAction that downloads the qcow2 plus pre-extracted kernel/initrd from Ubuntu’s cloud image repo.
  • Add an ignored, serialized integration test that validates Ubuntu image + boot artifacts are created on disk.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/image.rs Adds Ubuntu distro variant and implementation; wires Ubuntu into Image and create_image; adds a small enum test.
tests/ubuntu_image.rs Adds an ignored integration test that exercises Ubuntu image creation and validates output files exist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +371 to +377
// Ubuntu noble (24.04 LTS) cloud image base URL
let base_url = "https://cloud-images.ubuntu.com/noble/current";

// Download qcow2 image
let qcow2_url = format!("{}/noble-server-cloudimg-amd64.img", base_url);
let qcow2_path = image_dir.join(format!("{}.qcow2", name));
download_file(&qcow2_url, &qcow2_path).await?;
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu::download hard-codes the remote release/arch (noble, amd64) and ignores the name argument when selecting what to download. This makes create_image(Distro::Ubuntu, name) misleading because any name will still fetch noble and then be cached under that name. Consider parsing name into (release, variant, arch) or introducing explicit parameters/config for Ubuntu images, or at minimum validate that name matches the hard-coded remote artifact to avoid accidental mismatches.

Copilot uses AI. Check for mistakes.
Comment on lines +418 to +434
async fn download_file(url: &str, dest: &PathBuf) -> Result<()> {
debug!("Downloading {} to {}", url, dest.display());
let response = reqwest::get(url)
.await
.with_context(|| format!("failed to download from {}", url))?;

let mut file = File::create(dest)
.await
.with_context(|| format!("failed to create file at {}", dest.display()))?;

let mut stream = response.bytes_stream();
while let Some(chunk) = stream.next().await {
let chunk = chunk.with_context(|| "failed to read chunk from stream")?;
file.write_all(&chunk)
.await
.with_context(|| "failed to write to file")?;
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_file() does not validate the HTTP status code before writing the body to disk. If Ubuntu returns a 404/500 HTML error page, it will be saved as the qcow2/kernel/initrd and later treated as a valid cached image (since the checksum file is generated from whatever was downloaded). Use error_for_status() (or equivalent) before streaming the body, and consider failing early if the response is not successful.

Copilot uses AI. Check for mistakes.
Comment on lines +371 to +395
// Ubuntu noble (24.04 LTS) cloud image base URL
let base_url = "https://cloud-images.ubuntu.com/noble/current";

// Download qcow2 image
let qcow2_url = format!("{}/noble-server-cloudimg-amd64.img", base_url);
let qcow2_path = image_dir.join(format!("{}.qcow2", name));
download_file(&qcow2_url, &qcow2_path).await?;

// Download pre-extracted kernel
let kernel_url = format!(
"{}/unpacked/noble-server-cloudimg-amd64-vmlinuz-generic",
base_url
);
let kernel_path = image_dir.join("vmlinuz");
download_file(&kernel_url, &kernel_path).await?;

// Download pre-extracted initrd
let initrd_url = format!(
"{}/unpacked/noble-server-cloudimg-amd64-initrd-generic",
base_url
);
let initrd_path = image_dir.join("initrd.img");
download_file(&initrd_url, &initrd_path).await?;

Ok(())
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu downloads are not verified against Ubuntu-published checksums/signatures. Because the project writes a local checksum file after download, a corrupted or intercepted download will be permanently “trusted” on subsequent runs. Fetch and validate against Ubuntu’s published checksum/sig files for the image (and ideally the unpacked kernel/initrd too, if checksums are available), or otherwise add an external integrity verification step before saving metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +482 to +491
pub async fn create_image(distro: Distro, name: &str) -> Result<Image> {
match distro {
Distro::Debian => {
let image = ImageMeta::<Debian>::create(name).await?;
Ok(Image::Debian(image))
} // Add more distros as needed
}// Add more distros as needed
Distro::Ubuntu => {
let image = ImageMeta::<Ubuntu>::create(name).await?;
Ok(Image::Ubuntu(image))
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description claims “No guestfish dependency / WSL compatible”, but the library still globally requires guestfish and virt-copy-out via prerequisite checks before running (see ensure_prerequisites usage). If Ubuntu is intended to work without these tools, consider making prerequisite checks distro-aware (only require guestfish/virt-copy-out for distros that need extraction) or adjusting the claim/documentation to match actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines 577 to 582
#[test]
fn test_distro_enum_variants() {
let variants = vec![Distro::Debian, Distro::Ubuntu];
assert_eq!(variants.len(), 2);
}

Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_distro_enum_variants only asserts the length of a manually constructed vec of variants, which doesn’t validate behavior and will create churn whenever a new distro is added. Consider removing it, or replace it with a test that validates a real invariant (e.g., that create_image handles each enum variant, or that serialization/deserialization round-trips for known variants).

Suggested change
#[test]
fn test_distro_enum_variants() {
let variants = vec![Distro::Debian, Distro::Ubuntu];
assert_eq!(variants.len(), 2);
}

Copilot uses AI. Check for mistakes.
Signed-off-by: userhaptop <1307305157@qq.com>
… streaming hash computation

Signed-off-by: userhaptop <1307305157@qq.com>
@userhaptop userhaptop changed the title add Ubuntu cloud image support add multi-distro image support, custom image handling, and streaming hash computation Feb 10, 2026
@genedna
Copy link
Contributor

genedna commented Feb 10, 2026

@codex review

@genedna
Copy link
Contributor

genedna commented Feb 10, 2026

@claude

@github-actions
Copy link

github-actions bot commented Feb 10, 2026

Claude finished @genedna's task in 3m 37s —— View job


Comprehensive Code Review

This PR adds substantial functionality with multi-distro support, custom images, and streaming hash computation. While the implementation is generally solid, there are several critical security issues and code quality concerns that should be addressed before merging.


🔴 Critical Security Issues

1. No HTTP Status Code Validation in download_file() and download_with_hash()

Severity: HIGH | src/image.rs:1020-1035, 162-175

The download_file() and download_with_hash() functions don't validate HTTP response status codes before writing content to disk. If a server returns a 404/500 error page (HTML), it will be saved as a qcow2/kernel/initrd file and cached as valid.

let response = reqwest::get(url).await?;
// Missing: response.error_for_status()?
let mut stream = response.bytes_stream();

Impact: Corrupted downloads will be permanently cached and treated as valid images, leading to VM boot failures and potentially security issues.

Fix: Fix download_file HTTP validation →

2. Ubuntu Downloads Lack Checksum Verification

Severity: HIGH | src/image.rs:600-625

Ubuntu image downloads (qcow2, kernel, initrd) have zero integrity verification. Unlike Debian which validates against published SHA512SUMS, Ubuntu downloads are blindly trusted and permanently cached.

async fn download(&self, name: &str) -> Result<()> {
    download_file(&qcow2_url, &qcow2_path).await?;  // No verification!
    download_file(&kernel_url, &kernel_path).await?;  // No verification!
    download_file(&initrd_url, &initrd_path).await?;  // No verification!

Impact: Man-in-the-middle attacks, corrupted downloads, or compromised mirror servers could inject malicious images that would be permanently cached.

Fix: Download and verify Ubuntu's SHA256SUMS file from https://cloud-images.ubuntu.com/noble/current/SHA256SUMS before downloading images.

Add Ubuntu checksum verification →

3. Fedora and Arch Downloads Lack Checksum Verification

Severity: HIGH | src/image.rs:650-750, 780-850

Same issue as Ubuntu - Fedora and Arch downloads have no integrity verification against upstream checksums.

Fix: Implement checksum verification using official checksums from:

  • Fedora: Use CHECKSUM file from releases directory
  • Arch: Use checksums from https://geo.mirror.pkgbuild.com/images/latest/Arch-Linux-x86_64-cloudimg.qcow2.SHA256

4. Hard-coded Release/Architecture in Multi-Distro Implementations

Severity: MEDIUM | src/image.rs:600-625, 650-680, 780-810

All new distros hard-code their release and architecture (Ubuntu noble, Fedora 41, Arch latest) and completely ignore the name parameter. This makes create_image(Distro::Ubuntu, name) misleading:

// Any name will fetch Ubuntu Noble amd64 and cache it under that name
let image = create_image(Distro::Ubuntu, "my-custom-name").await?;

Impact: Confusing API behavior, inability to support multiple Ubuntu versions, cache key collisions.

Recommendation: Either:

  1. Parse name into (release, variant, arch) (e.g., "noble-server-amd64")
  2. Add explicit parameters to the API
  3. At minimum, validate that name matches expected format and document the limitation

🟡 Code Quality & Design Issues

5. Missing Public API Exports

src/lib.rs:9

The new types CustomImageConfig, ImageSource, ShaType are defined but never exported from src/lib.rs, making them unusable by external consumers. The test at tests/custom_image.rs:2 suggests they should be public:

use qlean::{CustomImageConfig, ImageSource, ShaType, create_custom_image};

Fix: Add exports to src/lib.rs:

pub use image::{CustomImageConfig, ImageSource, ShaType};

6. Inconsistent Hash Type Usage in Custom Images

src/image.rs:920-960

In Custom::download(), the same image_hash_type is used for kernel and initrd verification even though users might want different hash algorithms for different files. The field naming is also confusing (image_hash_type applies to all files, not just the image).

Recommendation: Either allow per-file hash types or rename to hash_type and document that it applies to all files.

7. Weak Test Coverage

src/image.rs:1248-1258

The test_distro_enum_variants test manually constructs a vector and checks its length - this provides no value and creates maintenance burden:

#[test]
fn test_distro_enum_variants() {
    let variants = vec![
        Distro::Debian,
        Distro::Ubuntu,
        Distro::Fedora,
        Distro::Arch,
        Distro::Custom,
    ];
    assert_eq!(variants.len(), 5);
}

Remove or improve this test →

8. Error Messages Could Be More Helpful

src/image.rs:970-1015

The Custom image extraction provides good error messages when guestfish fails, but could be improved by detecting WSL environment and providing OS-specific guidance.

9. Duplicate Code in Distro Implementations

src/image.rs:450-850

Fedora, Arch, and Debian use identical guestfish extraction logic. This should be extracted to a shared helper function.

Recommendation:

async fn extract_with_guestfish(
    image_dir: &Path,
    file_name: &str,
    kernel_patterns: &[&str],
    initrd_patterns: &[&str],
) -> Result<(PathBuf, PathBuf)>

🟢 Positive Aspects

  1. Excellent streaming hash implementation - The single-pass download+hash computation is well-designed and benchmarked
  2. Comprehensive test coverage - Unit tests, integration tests, and benchmarks all look solid
  3. Good error handling - Most error paths use .with_context() appropriately
  4. WSL-friendly design - The Custom image pre-extracted kernel/initrd option is a thoughtful addition
  5. Clean abstraction - The ImageAction trait provides good separation of concerns
  6. Proper async/blocking separation - Using spawn_blocking for sync I/O is correct

📋 Additional Recommendations

  1. Documentation: Add rustdoc examples for create_custom_image() showing both modes (with/without pre-extracted boot files)

  2. Consider rate limiting: The concurrent downloads might overwhelm upstream servers if many images are requested

  3. Add timeout configuration: reqwest::get() calls should have timeouts to prevent hanging on slow/stalled connections

  4. Validate image format: After download, consider a quick sanity check that the qcow2 file has a valid header

  5. Security advisory: Document in README that users should verify image sources are official/trusted, especially for Custom images

  6. CI Integration: The benchmark results mentioned in the PR description should be captured in CI to prevent performance regressions


Summary

This PR introduces valuable functionality but has critical security vulnerabilities related to missing checksum verification and HTTP validation. The Ubuntu, Fedora, and Arch implementations need upstream checksum verification before this can be safely merged.

Required before merge:

  • ❌ Add HTTP status validation to all download functions
  • ❌ Implement checksum verification for Ubuntu downloads
  • ❌ Implement checksum verification for Fedora downloads
  • ❌ Implement checksum verification for Arch downloads
  • ❌ Export public API types from lib.rs
  • ❌ Address hard-coded release/arch issue or document limitation

Recommended improvements:

  • Remove weak enum test or make it useful
  • Extract shared guestfish extraction logic
  • Add timeout configuration to downloads
  • Add rustdoc examples for custom images

Overall: Strong implementation with good design, but security issues must be addressed first. The streaming hash optimization is excellent work.


Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: edb020810d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1125 to +1127
Distro::Ubuntu => {
let image = ImageMeta::<Ubuntu>::create(name).await?;
Ok(Image::Ubuntu(image))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Isolate image cache keys by distro

The new Distro::Ubuntu/Fedora/Arch paths all call ImageMeta::<...>::create(name), but cache lookup/loading is still keyed only by name (images/<name>.json and images/<name>/...), so creating a second distro with the same name can return the first distro’s cached image as a different variant without re-downloading. This is a silent correctness issue that can boot the wrong OS image; include distro in the cache key or persist+validate distro metadata before accepting a cache hit.

Useful? React with 👍 / 👎.

Comment on lines +1039 to +1041
let response = reqwest::get(url)
.await
.with_context(|| format!("failed to download from {}", url))?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject non-success HTTP responses when downloading images

download_file streams any HTTP response body directly to disk without checking status codes, so 404/500 pages are treated as successful downloads. In the Ubuntu flow this can pass creation because extraction only checks that vmlinuz/initrd.img files exist, leading to invalid boot artifacts being cached and returned as a successful image. Call error_for_status() (and ideally checksum-verify these distro downloads) before writing.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants