ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain by moonshadow-25 · Pull Request #20536 · ggml-org/llama.cpp

moonshadow-25 · 2026-03-14T04:58:53Z

Description:

Problem

On AMD APU/iGPU devices (unified memory architecture, e.g. AMD Strix Halo gfx1151), hipMemAdviseSetCoarseGrain returns
hipErrorInvalidValue because this hint is not applicable to UMA systems. The current code wraps this call in CUDA_CHECK(), which treats
it as a fatal error and crashes.

Fix

Treat hipMemAdviseSetCoarseGrain as an optional performance hint:

Remove CUDA_CHECK() wrapper
Clear any resulting error with hipGetLastError() to prevent propagation

This matches the intent of the existing comment ("fall back to cudaMalloc if not supported") and is consistent with how optional hints are
handled elsewhere.

Additional Changes

Add GGML_LOG_DEBUG pre-allocation memory logging to help diagnose memory issues on APU systems
Store totalGlobalMem in device info struct for future use

Testing

Tested on AMD Strix Halo (gfx1151), 128GB unified memory, Windows 11:

✅ Before: crash on hipMemAdviseSetCoarseGrain with hipErrorInvalidValue
✅ After: runs successfully, no error propagation

Context: ROCm APU Large BAR Bug

AMD APUs on Windows are currently limited to ~64GB hipMallocManaged allocations due to a ROCm runtime bug where largeBar_ is
unconditionally disabled for all APU devices in HIP mode. This causes the Windows GART allocator's 50%-of-RAM cap to trigger prematurely.

A fix has been submitted to ROCm upstream:
ROCm/rocm-systems#4077

Without that ROCm fix, APUs are still limited to ~64GB regardless of this change. However, this PR is independently valuable:

Prevents crashes on APUs with current ROCm versions
Enables full functionality once users update to a patched ROCm runtime
The debug logging helps users diagnose memory allocation issues

Impact

✅ APU/iGPU users: no more crashes when using GGML_CUDA_ENABLE_UNIFIED_MEMORY
✅ Discrete GPU users: no change (hint is valid and still applied)
✅ No performance regression

…eSetCoarseGrain On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain returns hipErrorInvalidValue because the hint is not applicable to UMA systems. The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on APU systems such as AMD Strix Halo (gfx1151). Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it without error checking and clear any resulting error with hipGetLastError(). Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory issues on APU systems, and store totalGlobalMem in device info. Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits hipMallocManaged to ~64GB regardless of available system RAM. A fix has been submitted upstream: ROCm/rocm-systems#4077 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

moonshadow-25 · 2026-03-14T06:16:25Z

Related to #20472 which fixes the same class of AMD APU issues on Linux.
This PR addresses the Windows-specific side:

hipMemAdviseSetCoarseGrain returns hipErrorInvalidValue on APU/UMA
systems (Windows), which was being treated as a fatal error causing crashes.
The underlying >64GB allocation limit on Windows APUs is a ROCm runtime bug,
fixed upstream at: [Windows/HIP] Fix APU large BAR detection to enable >64GB unified memory allocation ROCm/rocm-systems#4077

Together with #20472, these fixes make AMD APUs fully functional for LLM
inference on both Linux and Windows.

Relevant issues: #18159 #19818 #19764 #18650

moonshadow-25 · 2026-03-14T06:32:05Z

This PR fixes the immediate crash issue (hipMemAdviseSetCoarseGrain returning
hipErrorInvalidValue on APU/UMA systems).

However, the underlying >64GB allocation limit reported in #19764 is a ROCm
runtime bug that requires an upstream fix: ROCm/rocm-systems#4077

Once both fixes are merged, Windows APU users (Strix Halo, etc.) will be able
to utilize their full system memory.

Related: #20472 fixes the Linux memory reporting side.
Also relevant: #18159 #19818

JohannesGaessler · 2026-03-14T09:45:50Z

Make one PR per fix please.

moonshadow-25 · 2026-03-14T11:01:31Z

Make one PR per fix please.

@JohannesGaessler Thanks for reviewing! These two PRs fix different issues:

#20472 (by @hogeheer499-commits):

Problem: UMA detection on Linux reads /proc/meminfo instead of hipMemGetInfo(), losing ~30GB of usable memory
Fix: Skip UMA path for HIP builds on Linux
Platform: Primarily Linux

This PR (#20536):

Problem: hipMemAdviseSetCoarseGrain returns hipErrorInvalidValue on APU/UMA systems (Windows), causing crashes with CUDA_CHECK()
Fix: Treat it as an optional hint, clear the error instead of fatal exit
Platform: Primarily Windows (though the fix is cross-platform)

They're complementary fixes for AMD APU compatibility:

ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs #20472 fixes memory reporting accuracy
This PR fixes the crash when using unified memory

Additional context: There's also an upstream ROCm bug limiting Windows APU allocations to ~64GB (reported in #19764). I've submitted a
fix: ROCm/rocm-systems#4077

Without that ROCm fix, Windows APU users hit the 64GB limit regardless. But this PR still has value:

Prevents crashes on current ROCm versions
Enables full functionality once the ROCm fix is merged
The debug logging helps diagnose memory issues

Would you like me to simplify this PR (remove debug logging / total_vram tracking) to focus only on the hipMemAdviseSetCoarseGrain fix?

JohannesGaessler · 2026-03-14T12:20:04Z

You are making unrelated changes to total_vram which, if correct, should be in a separate PR and motivated separately. I don't think the logging provides enough useful information vs. a debugger and should not be added.

Also according to the llama.cpp AI usage policy:

It is strictly prohibited to use AI to write your posts for you (bug reports, feature requests, pull request descriptions, Github discussions, responding to humans, ...).

…ain fix

moonshadow-25 · 2026-03-15T03:52:25Z

You are making unrelated changes to total_vram which, if correct, should be in a separate PR and motivated separately. I don't think the logging provides enough useful information vs. a debugger and should not be added.

Also according to the llama.cpp AI usage policy:

It is strictly prohibited to use AI to write your posts for you (bug reports, feature requests, pull request descriptions, Github discussions, responding to humans, ...).

I resubmitted the clean code.
Changes (only 2 lines):
Original:
CUDA_CHECK(cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device));
Changed to:
cudaMemAdvise(*ptr, size, hipMemAdviseSetCoarseGrain, device);
(void)hipGetLastError();

Test Environment
Hardware: AMD Strix Halo (gfx1151), 128GB unified memory
System: Windows 11
ROCm Version: 7.2

Test Results
After setting GGML_CUDA_ENABLE_UNIFIED_MEMORY=1:
Before fix: Program crashed, reporting hipErrorInvalidValue
After fix: Program runs normally, memory allocation succeeded

Notes
This fix resolved the crash issue. However, there is still an upstream ROCm bug on Windows APUs,
which causes hipMallocManaged to allocate a maximum of about 64GB (even if the system has 128GB).
I have already submitted a fix for that issue to ROCm upstream:
ROCm/rocm-systems#4077

This test runs on Windows, and hipMallocManaged returns hipSuccess (not hipErrorNotSupported), indicating that the code will reach the line hipMemAdviseSetCoarseGrain. In the version without the fix, it would crash here.

Also, sorry, I just used AI for text translation, not to submit a PR with AI, which might have made you feel there was an AI touch.

JohannesGaessler · 2026-03-15T07:55:30Z

Also, sorry, I just used AI for text translation, not to submit a PR with AI, which might have made you feel there was an AI touch.

Thank you for clarifying. We are unfortunately in a position where we had to ban it because that is the only feasible way for us to avoid having to sift through a lot of incorrect or hallucinated issues/PRs.

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Mar 14, 2026

moonshadow-25 mentioned this pull request Mar 14, 2026

ggml-cuda : fix UMA memory detection for HIP/ROCm on AMD APUs #20472

Open

moonshadow-25 mentioned this pull request Mar 14, 2026

Eval bug: OOM when loading models >96gb on STX-H 128GB (despite 110GB available via llama-cli --list-devices) #19764

Closed

ggml/hip: remove unrelated changes, keep only hipMemAdviseSetCoarseGr…

4c72b96

…ain fix

loci-dev mentioned this pull request Mar 15, 2026

UPSTREAM PR #20536: ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain auroralabs-loci/llama.cpp#1255

Open

JohannesGaessler approved these changes Mar 15, 2026

View reviewed changes

JohannesGaessler merged commit 8b7d340 into ggml-org:master Mar 15, 2026
81 of 82 checks passed

moonshadow-25 mentioned this pull request Mar 16, 2026

[Windows/HIP] Fix APU large BAR detection to enable >64GB unified memory allocation ROCm/rocm-systems#4077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain#20536

ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain#20536
JohannesGaessler merged 2 commits intoggml-org:masterfrom
moonshadow-25:fix/hip-apu-compatibility

moonshadow-25 commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 14, 2026

Uh oh!

JohannesGaessler commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 14, 2026

Uh oh!

JohannesGaessler commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 15, 2026 •

edited

Loading

Uh oh!

JohannesGaessler commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moonshadow-25 commented Mar 14, 2026

Problem

Fix

Additional Changes

Testing

Context: ROCm APU Large BAR Bug

Impact

Uh oh!

moonshadow-25 commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 14, 2026

Uh oh!

JohannesGaessler commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 14, 2026

Uh oh!

JohannesGaessler commented Mar 14, 2026

Uh oh!

moonshadow-25 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

moonshadow-25 commented Mar 15, 2026 •

edited

Loading