Break out of epoll_wait when adding loop callbacks during event loop#2559
Break out of epoll_wait when adding loop callbacks during event loop#2559benoit-nexthop wants to merge 1 commit intofacebook:mainfrom
epoll_wait when adding loop callbacks during event loop#2559Conversation
When `runInLoop()` is called while the EventBase loop is running (but not inside `runLoopCallbacks`), the callback is added to `loopCallbacks_` but the loop may be blocked in `epoll_wait` with a long timeout. This causes significant latency as the callback won't be processed until the timeout expires or another event occurs. Fix this by calling `eb_event_base_loopbreak()` when adding callbacks to `loopCallbacks_` while the loop is running. This breaks out of `epoll_wait` so the callback can be processed promptly in the next loop iteration. This is particularly important for Thrift clients where responses arrive on a different thread and are queued via `runInLoop()`. Without this fix, each RPC call could be delayed by up to the epoll timeout (e.g., 30s). In OSS FBOSS this fixes the 5-7s delay seen on every Thrift call. Those can add up quickly, for example FBOSS CLI end-to-end tests now take 1.4s instead of ~155s without this fix.
|
I previously attempted to work around this issue in FBOSS with facebook/fboss#628, which contained a commit facebook/fboss@b66e380 that used |
16b4ef2 to
958530f
Compare
|
I amended my commit just above to change the condition from A typical call stack looks like this: So I'm undoing this amendment for now. |
958530f to
16b4ef2
Compare
Summary: This diff just added the experimental thrift_latency_test.cpp from facebook/folly#2559 With the oss build, this simple thrift call can take 6418ms ``` [root@0eb3f90dd235 fboss]# ./thrift_latency_test Creating Thrift server... I0116 22:30:33.090906 42865 ThriftServer.cpp:2060] Using randomly generated TLS ticket keys. I0116 22:30:33.091259 42865 ThriftServer.cpp:988] Using resource pools on address/port [::1]:0: thrift flag: true, enable gflag: false, disable gflag: false, runtime actions: I0116 22:30:33.091567 42865 ThriftServer.cpp:1538] Resource pools configured: 2 I0116 22:30:33.092444 42865 ThriftServer.cpp:746] ThriftServer listening on address/port: [::1]:37999 Server listening on port 37999 Making RPC call... RPC call completed in 6418 ms Result: lastAppliedInMs = 12345 ERROR: RPC took too long! Expected < 100ms ``` While in our internal build, the same test without using any FBOSS internal logic only take 16ms ``` ❯❯❯ buck2 run fbcode//fboss/cli/fboss2/test:thrift_latency_test Buck UI: https://www.internalfb.com/buck2/fc447728-e42e-45da-8e9b-a29ae80abeb5 Network: Up: 0B Down: 0B Command: run. Time elapsed: 0.0s BUILD SUCCEEDED - starting your binary Creating Thrift server... I0116 14:35:48.140404 328824 ThriftServer.cpp:2060] Using randomly generated TLS ticket keys. I0116 14:35:48.142755 328824 ThriftServer.cpp:988] Using resource pools on address/port [::1]:0: thrift flag: true, enable gflag: false, disable gflag: false, runtime actions: I0116 14:35:48.145723 328824 ThriftServer.cpp:1538] Resource pools configured: 2 I0116 14:35:48.281738 328824 ThriftServer.cpp:746] ThriftServer listening on address/port: [::1]:40581 Server listening on port 40581 Making RPC call... RPC call completed in 16 ms Result: lastAppliedInMs = 12345 OK: RPC latency is acceptable ``` This is a huge difference b.w OSS build and internal build: **6418ms vs 16ms** As you can see in thrift_latency_test.cpp, we used `ScopedServerInterfaceThread` to spawn a simple fboss agent thrift server, then use `FbossCtrlAsyncClient` with `RocketClientChannel` to call a thrift function. No FBOSS internal logic is used here. I am suspecting there's performance discrepancy b/w our internal build and OSS build for fbthrift. Based on the new comments from https://fb.workplace.com/groups/560979627394613/permalink/3541014286057784/ I asked Metamate to generate a similar test but use EpollBackend and IoUring to create a thrift server Reviewed By: kevin645 Differential Revision: D90894498 fbshipit-source-id: c8d04d6434c7aca326bc33133d40f514ea8252bb
|
Related FBOSS change for context: facebook/fboss@43100ad — switching away from libevent to the epoll based backend. |
Thrift RPC Latency Bug
Summary
Thrift RPC calls experience multi-second latency (5-7 seconds) when they should complete in milliseconds. The root cause is that
EventBase::runInLoop()adds callbacks toloopCallbacks_without waking up theEventBaseif it's blocked ineb_event_base_loop().Environment
Reproducer
Here is minimal test case to be built in the FBOSS OSS tree demonstrates the issue:
Save this file to
fboss/cli/fboss2/test/thrift_latency_test.cppand add this at the end ofcmake/CliFboss2Test.cmake:Before the fix this test binary takes 5-7 seconds to execute, after the fix it takes a few milliseconds.
Root Cause Analysis
The Problem
When a Thrift RPC response is ready to be sent, the
WriteBatchercallsrunInLoop(this, true)to schedule the write. ThethisIteration=trueparameter indicates the callback should run in the current loop iteration.However, the callback is only added to
runOnceCallbacks_if we're currently insiderunLoopCallbacks(). Otherwise, it's added toloopCallbacks_for the next iteration.The issue occurs when:
EventBaseis insideeb_event_base_loop()(blocked inepoll_wait)runInLoop()runOnceCallbacks_is null (we're not inrunLoopCallbacks())loopCallbacks_eb_event_base_loop()continues blockingloopCallbacks_untilepoll_waittimes out (~5-6 seconds typically)Code Flow (Before Fix)
Instrumentation
I sprinkled a ton of debug logging in folly and fbthrift to debug the issue:
EventBase.cpp
eb_event_base_loop(blocking vs non-blocking)runInLoopadds toloopCallbacks_vsrunOnceCallbacks_runLoopCallbacksstarts/endsWriteBatcher-inl.h
enqueueWritecallsrunInLoopReproduction (Without Fix)
Debug Output
Excerpt of the output from the reproduction binary shared above with the extra debug logging I added:
(See below for captured
straceoutput showing the 6-secondepoll_wait)The Fix
Add a call to
eb_event_base_loopbreak()when adding toloopCallbacks_while the loop is running:This causes
eb_event_base_loop()to return immediately, allowing the loop to process the pending callbacks. Thisif/elsesequence was duplicated so rather than fixing this in the same way in two different places, I've factored this out in a private helper.Results
Based on
thrift_latency_test.cppabove, running on AMD EPYC 9655P:Captured Traces
Before Fix (Issue Reproduction)
straceoutputkey events for thread 1910218, the server IO thread:
Debug output
Observations
WriteBatcher calls
runInLoop(this, true): ThethisIteration=1parameter indicatesthe callback should run immediately in the current loop iteration.
Callback goes to
loopCallbacks_instead ofrunOnceCallbacks_: The log showsrunOnceCallbacks_=(nil)because we're NOT insiderunLoopCallbacks()at this point.We're inside a libevent callback (epoll event handler), so the callback is queued.
libevent goes back into
epoll_waitafter processing the event: This is the bug.The flow is:
eb_event_base_loop(EVLOOP_ONCE)callsepoll_waitepoll_waitreturnsWriteBatcher)WriteBatcheradds callbacks to folly'sloopCallbacks_epoll_waitbecause it doesn't know about folly'sloopCallbacks_!eb_event_base_loopfinally returnsrunLoopCallbacks()processes the 2 queued callbacksThe ~5 second delay is a single
epoll_waittimeout: The timeout of 5042ms is set bylibevent based on the next timer event. The callbacks sit in
loopCallbacks_for theentire duration because libevent has no way to know they're waiting.
After Fix
strace output
Debug output
Observations
Same callback path: The callback still goes to
loopCallbacks_(sincerunOnceCallbacks_is still null), but now we also call
eb_event_base_loopbreak().Immediate wakeup: The
eb_event_base_loopbreak()call causesepoll_waitto returnimmediately, allowing the
EventBaseto process the pending callback.RPC latency reduced from ~6 seconds to ~1-8ms: this helps all OSS FBOSS users keep their sanity.
No timeout waiting: The
epoll_waitno longer blocks for the full timeout period.The loop breaks immediately and processes callbacks.
Conclusion
The fix adds a single call to
eb_event_base_loopbreak()when adding callbacks toloopCallbacks_while the loop is running. This ensures theEventBasewakes up promptlyto process the callback instead of waiting for the next event or timeout.
This is a fix for any code path that uses
runInLoop()withthisIteration=truewhile inside an epoll callback handler, as the callback will otherwise be delayed until
the next event loop timeout (typically 5-7 seconds).
It has been reported that this issue has not been observed inside Meta, and I cannot explain why.