Skip to content

ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain#20536

Merged
JohannesGaessler merged 2 commits intoggml-org:masterfrom
moonshadow-25:fix/hip-apu-compatibility
Mar 15, 2026
Merged

ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain#20536
JohannesGaessler merged 2 commits intoggml-org:masterfrom
moonshadow-25:fix/hip-apu-compatibility

Conversation

@moonshadow-25
Copy link
Contributor

Description:

Problem

On AMD APU/iGPU devices (unified memory architecture, e.g. AMD Strix Halo gfx1151), hipMemAdviseSetCoarseGrain returns
hipErrorInvalidValue because this hint is not applicable to UMA systems. The current code wraps this call in CUDA_CHECK(), which treats
it as a fatal error and crashes.

Fix

Treat hipMemAdviseSetCoarseGrain as an optional performance hint:

  • Remove CUDA_CHECK() wrapper
  • Clear any resulting error with hipGetLastError() to prevent propagation

This matches the intent of the existing comment ("fall back to cudaMalloc if not supported") and is consistent with how optional hints are
handled elsewhere.

Additional Changes

  • Add GGML_LOG_DEBUG pre-allocation memory logging to help diagnose memory issues on APU systems
  • Store totalGlobalMem in device info struct for future use

Testing

Tested on AMD Strix Halo (gfx1151), 128GB unified memory, Windows 11:

  • ✅ Before: crash on hipMemAdviseSetCoarseGrain with hipErrorInvalidValue
  • ✅ After: runs successfully, no error propagation

Context: ROCm APU Large BAR Bug

AMD APUs on Windows are currently limited to ~64GB hipMallocManaged allocations due to a ROCm runtime bug where largeBar_ is
unconditionally disabled for all APU devices in HIP mode. This causes the Windows GART allocator's 50%-of-RAM cap to trigger prematurely.

A fix has been submitted to ROCm upstream:
ROCm/rocm-systems#4077

Without that ROCm fix, APUs are still limited to ~64GB regardless of this change. However, this PR is independently valuable:

  1. Prevents crashes on APUs with current ROCm versions
  2. Enables full functionality once users update to a patched ROCm runtime
  3. The debug logging helps users diagnose memory allocation issues

Impact

  • ✅ APU/iGPU users: no more crashes when using GGML_CUDA_ENABLE_UNIFIED_MEMORY
  • ✅ Discrete GPU users: no change (hint is valid and still applied)
  • ✅ No performance regression

…eSetCoarseGrain

On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain
returns hipErrorInvalidValue because the hint is not applicable to UMA systems.
The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on
APU systems such as AMD Strix Halo (gfx1151).

Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it
without error checking and clear any resulting error with hipGetLastError().

Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory
issues on APU systems, and store totalGlobalMem in device info.

Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits
hipMallocManaged to ~64GB regardless of available system RAM. A fix has been
submitted upstream: ROCm/rocm-systems#4077

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 14, 2026
@moonshadow-25
Copy link
Contributor Author

Related to #20472 which fixes the same class of AMD APU issues on Linux.
This PR addresses the Windows-specific side:

  1. hipMemAdviseSetCoarseGrain returns hipErrorInvalidValue on APU/UMA
    systems (Windows), which was being treated as a fatal error causing crashes.

  2. The underlying >64GB allocation limit on Windows APUs is a ROCm runtime bug,
    fixed upstream at: [Windows/HIP] Fix APU large BAR detection to enable >64GB unified memory allocation ROCm/rocm-systems#4077

Together with #20472, these fixes make AMD APUs fully functional for LLM
inference on both Linux and Windows.

Relevant issues: #18159 #19818 #19764 #18650

@moonshadow-25
Copy link
Contributor Author

This PR fixes the immediate crash issue (hipMemAdviseSetCoarseGrain returning
hipErrorInvalidValue on APU/UMA systems).

However, the underlying >64GB allocation limit reported in #19764 is a ROCm
runtime bug that requires an upstream fix: ROCm/rocm-systems#4077

Once both fixes are merged, Windows APU users (Strix Halo, etc.) will be able
to utilize their full system memory.

Related: #20472 fixes the Linux memory reporting side.
Also relevant: #18159 #19818

@JohannesGaessler
Copy link
Contributor

Make one PR per fix please.

@moonshadow-25
Copy link
Contributor Author

Make one PR per fix please.

@JohannesGaessler Thanks for reviewing! These two PRs fix different issues:

#20472 (by @hogeheer499-commits):

  • Problem: UMA detection on Linux reads /proc/meminfo instead of hipMemGetInfo(), losing ~30GB of usable memory
  • Fix: Skip UMA path for HIP builds on Linux
  • Platform: Primarily Linux

This PR (#20536):

  • Problem: hipMemAdviseSetCoarseGrain returns hipErrorInvalidValue on APU/UMA systems (Windows), causing crashes with CUDA_CHECK()
  • Fix: Treat it as an optional hint, clear the error instead of fatal exit
  • Platform: Primarily Windows (though the fix is cross-platform)

They're complementary fixes for AMD APU compatibility:

Additional context: There's also an upstream ROCm bug limiting Windows APU allocations to ~64GB (reported in #19764). I've submitted a
fix: ROCm/rocm-systems#4077

Without that ROCm fix, Windows APU users hit the 64GB limit regardless. But this PR still has value:

  1. Prevents crashes on current ROCm versions
  2. Enables full functionality once the ROCm fix is merged
  3. The debug logging helps diagnose memory issues

Would you like me to simplify this PR (remove debug logging / total_vram tracking) to focus only on the hipMemAdviseSetCoarseGrain fix?

@JohannesGaessler
Copy link
Contributor

You are making unrelated changes to total_vram which, if correct, should be in a separate PR and motivated separately. I don't think the logging provides enough useful information vs. a debugger and should not be added.

Also according to the llama.cpp AI usage policy:

It is strictly prohibited to use AI to write your posts for you (bug reports, feature requests, pull request descriptions, Github discussions, responding to humans, ...).

@moonshadow-25
Copy link
Contributor Author

moonshadow-25 commented Mar 15, 2026

You are making unrelated changes to total_vram which, if correct, should be in a separate PR and motivated separately. I don't think the logging provides enough useful information vs. a debugger and should not be added.

Also according to the llama.cpp AI usage policy:

It is strictly prohibited to use AI to write your posts for you (bug reports, feature requests, pull request descriptions, Github discussions, responding to humans, ...).

I resubmitted the clean code.
Changes (only 2 lines):
Original:
CUDA_CHECK(cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device));
Changed to:
cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device);
(void)hipGetLastError();

Test Environment
Hardware: AMD Strix Halo (gfx1151), 128GB unified memory
System: Windows 11
ROCm Version: 7.2

Test Results
After setting GGML_CUDA_ENABLE_UNIFIED_MEMORY=1:
Before fix: Program crashed, reporting hipErrorInvalidValue
After fix: Program runs normally, memory allocation succeeded

Notes
This fix resolved the crash issue. However, there is still an upstream ROCm bug on Windows APUs,
which causes hipMallocManaged to allocate a maximum of about 64GB (even if the system has 128GB).
I have already submitted a fix for that issue to ROCm upstream:
ROCm/rocm-systems#4077

ScreenShot_2026-03-15_114726_406

This test runs on Windows, and hipMallocManaged returns hipSuccess (not hipErrorNotSupported), indicating that the code will reach the line hipMemAdviseSetCoarseGrain. In the version without the fix, it would crash here.

Also, sorry, I just used AI for text translation, not to submit a PR with AI, which might have made you feel there was an AI touch.

@JohannesGaessler
Copy link
Contributor

Also, sorry, I just used AI for text translation, not to submit a PR with AI, which might have made you feel there was an AI touch.

Thank you for clarifying. We are unfortunately in a position where we had to ban it because that is the only feasible way for us to avoid having to sift through a lot of incorrect or hallucinated issues/PRs.

@JohannesGaessler JohannesGaessler merged commit 8b7d340 into ggml-org:master Mar 15, 2026
81 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants