Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While my machine was under memory pressure and the OOM killer went around, it killed a bunch of sssd services too. sssd itself restarted them and at some point failed to do so. Then it decided to terminate itself with exit code 1, which is not abnormal so systemd never restarted it. This resulted in me not being able to connect to this machine via ssh anymore, so I needed the IT to get access again. (See sssd log below)
While other sssd services have
Restart=on-failurethe main one usesRestart=on-abnormalwhich in this case resulted in an inaccessible system state for me. I would expect a system to recover from this, which should happen with this change.Also see the man pages about the different systemd service
Restart=settings.This PR is about the systemd service which should restart sssd.
There are other issues about sssd stopping to work which are about sssd itself. So I think they are somewhat related but different. The other issues are about the root causes which do not apply here as my issue was the OOM Killer and Memory Pressure, not sssd being unstable by itself. This should also mitigate the other mentioned issues but won’t solve their underlying cause, so they should stay open.
For the short term workaround this systemd service override was added:
Adding
OOMScoreAdjustto this service is also propagated to the child services being started. If this is also something you would like for the service, I can create another PR with it. Personally I thinkRestart=on-failureis a fix, so different from an addition → separate discussion.Existing restart conditions in this Repo before this PR. Note that this is the only
on-abnormalservice.Log of the situation in the journal (Some things are censored with
###).System reboot at Jan 24, OOM killer going around on Jan 26, sssd manually restarted on Jan 27