Skip to content

Fix deadlock in PJSIP channel creation with endpoint variables#181

Draft
jirkah wants to merge 4 commits intomasterfrom
claude/fix-asterisk-deadlock-XAfCX
Draft

Fix deadlock in PJSIP channel creation with endpoint variables#181
jirkah wants to merge 4 commits intomasterfrom
claude/fix-asterisk-deadlock-XAfCX

Conversation

@jirkah
Copy link

@jirkah jirkah commented Mar 5, 2026

Summary

This patch fixes a deadlock that occurs during PJSIP channel creation when endpoint channel variables are set while the channel lock is held.

Key Changes

  • Moved the endpoint channel variable assignment to occur after the channel is unlocked in chan_pjsip.c
  • The variables are now set after ast_channel_unlock(chan) instead of before, preventing deadlock scenarios

Implementation Details

The deadlock occurred because setting endpoint channel variables can trigger dialplan functions (such as PJSIP_HEADER) that may block on serializer tasks. When these operations occur while the channel lock is held, it creates a deadlock condition.

By deferring the variable assignment until after the channel is unlocked, the channel lock is no longer held when these potentially-blocking operations execute, eliminating the deadlock while maintaining the same functional behavior.

https://claude.ai/code/session_017gxrJFgjWNAVm25sYc76DP

…nnel var

Move endpoint channel_vars loop after ast_channel_unlock() in
chan_pjsip_new() to prevent deadlock when variables invoke dialplan
functions (e.g., PJSIP_HEADER) that block on PJSIP serializer tasks
while the channel lock is held.

Deadlock cycle with ao2_legacy (default) storage backend:
1. Thread A (chan_pjsip_new): holds channel_lock → calls
   pbx_builtin_setvar_helper("PJSIP_HEADER(add,...)") → dispatches to
   func_write_header() → ast_sip_push_task_wait_serializer() → blocks
   waiting for serializer to complete the task
2. Serializer thread: already processing a prior task that iterates
   channels → ao2_callback on channels container → acquires whole-
   container lock → by_name_cb tries ast_channel_lock on the channel
   held by Thread A → BLOCKED
3. Thread C (channel hangup): needs container lock for ao2_unlink →
   BLOCKED (serializer holds it)

This is a classic ABBA inversion: Thread A holds channel_lock and
(transitively via serializer) needs container_lock; serializer holds
container_lock and needs channel_lock.

The race window widened in 22.6.0 due to added channel locking in
res_musiconhold.c (moh_files_alloc, local_ast_moh_start) and new CEL
event publishing, increasing contention enough to trigger reliably
under high call volume.

Moving the loop after unlock is safe because pbx_builtin_setvar_helper()
acquires its own channel lock internally.

https://claude.ai/code/session_017gxrJFgjWNAVm25sYc76DP
@wazo-community-zuul
Copy link
Contributor

The patch was hand-written with spaces instead of tabs, causing
dpkg-source to reject it as malformed. Regenerated from a real
unified diff of the source file to preserve exact whitespace.

https://claude.ai/code/session_017gxrJFgjWNAVm25sYc76DP
@wazo-community-zuul
Copy link
Contributor

and into header; to minimize patch changes and avoid upstream conflicts
@wazo-community-zuul
Copy link
Contributor

@wazo-community-zuul
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants