build(deps): bump pip from 23.2.1 to 23.3 in /drivers/gpu/drm/ci/xfails#2
Open
dependabot[bot] wants to merge 1 commit intomasterfrom
Open
build(deps): bump pip from 23.2.1 to 23.3 in /drivers/gpu/drm/ci/xfails#2dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot[bot] wants to merge 1 commit intomasterfrom
Conversation
Bumps [pip](https://github.com/pypa/pip) from 23.2.1 to 23.3. - [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst) - [Commits](pypa/pip@23.2.1...23.3) --- updated-dependencies: - dependency-name: pip dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
woodsts
pushed a commit
that referenced
this pull request
Nov 13, 2023
Chuyi Zhou says: ==================== Relax allowlist for open-coded css_task iter Hi, The patchset aims to relax the allowlist for open-coded css_task iter suggested by Alexei[1]. Please see individual patches for more details. And comments are always welcome. Patch summary: * Patch #1: Relax the allowlist and let css_task iter can be used in bpf iters and any sleepable progs. * Patch #2: Add a test in cgroup_iters.c which demonstrates how css_task iters can be combined with cgroup iter. * Patch #3: Add a test to prove css_task iter can be used in normal * sleepable progs. link[1]:https://lore.kernel.org/lkml/CAADnVQKafk_junRyE=-FVAik4hjTRDtThymYGEL8hGTuYoOGpA@mail.gmail.com/ --- Changes in v2: * Fix the incorrect logic in check_css_task_iter_allowlist. Use expected_attach_type to check whether we are using bpf_iters. * Link to v1:https://lore.kernel.org/bpf/20231022154527.229117-1-zhouchuyi@bytedance.com/T/#m946f9cde86b44a13265d9a44c5738a711eb578fd Changes in v3: * Add a testcase to prove css_task can be used in fentry.s * Link to v2:https://lore.kernel.org/bpf/20231024024240.42790-1-zhouchuyi@bytedance.com/T/#m14a97041ff56c2df21bc0149449abd275b73f6a3 Changes in v4: * Add Yonghong's ack for patch #1 and patch #2. * Solve Yonghong's comments for patch #2 * Move prog 'iter_css_task_for_each_sleep' from iters_task_failure.c to iters_css_task.c. Use RUN_TESTS to prove we can load this prog. * Link to v3:https://lore.kernel.org/bpf/20231025075914.30979-1-zhouchuyi@bytedance.com/T/#m3200d8ad29af4ffab97588e297361d0a45d7585d --- ==================== Link: https://lore.kernel.org/r/20231031050438.93297-1-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 13, 2023
When LAN9303 is MDIO-connected two callchains exist into
mdio->bus->write():
1. switch ports 1&2 ("physical" PHYs):
virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})->
lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested
2. LAN9303 virtual PHY:
virtual MDIO bus (lan9303_phy_{read|write}) ->
lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write}
If the latter functions just take
mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP
false-positive splat. It's false-positive because the first
mdio_lock in the second callchain above belongs to virtual MDIO bus, the
second mdio_lock belongs to physical MDIO bus.
Consequent annotation in lan9303_mdio_{read|write} as nested lock
(similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus)
prevents the following splat:
WARNING: possible circular locking dependency detected
5.15.71 #1 Not tainted
------------------------------------------------------
kworker/u4:3/609 is trying to acquire lock:
ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex
but task is already holding lock:
ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&bus->mdio_lock){+.+.}-{3:3}:
lock_acquire
__mutex_lock
mutex_lock_nested
lan9303_mdio_read
_regmap_read
regmap_read
lan9303_probe
lan9303_mdio_probe
mdio_probe
really_probe
__driver_probe_device
driver_probe_device
__device_attach_driver
bus_for_each_drv
__device_attach
device_initial_probe
bus_probe_device
deferred_probe_work_func
process_one_work
worker_thread
kthread
ret_from_fork
-> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}:
__lock_acquire
lock_acquire.part.0
lock_acquire
__mutex_lock
mutex_lock_nested
regmap_lock_mutex
regmap_read
lan9303_phy_read
dsa_slave_phy_read
__mdiobus_read
mdiobus_read
get_phy_device
mdiobus_scan
__mdiobus_register
dsa_register_switch
lan9303_probe
lan9303_mdio_probe
mdio_probe
really_probe
__driver_probe_device
driver_probe_device
__device_attach_driver
bus_for_each_drv
__device_attach
device_initial_probe
bus_probe_device
deferred_probe_work_func
process_one_work
worker_thread
kthread
ret_from_fork
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&bus->mdio_lock);
lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
lock(&bus->mdio_lock);
lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock);
*** DEADLOCK ***
5 locks held by kworker/u4:3/609:
#0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work
#1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work
#2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach
#3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch
#4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read
stack backtrace:
CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1
Workqueue: events_unbound deferred_probe_work_func
Call trace:
dump_backtrace
show_stack
dump_stack_lvl
dump_stack
print_circular_bug
check_noncircular
__lock_acquire
lock_acquire.part.0
lock_acquire
__mutex_lock
mutex_lock_nested
regmap_lock_mutex
regmap_read
lan9303_phy_read
dsa_slave_phy_read
__mdiobus_read
mdiobus_read
get_phy_device
mdiobus_scan
__mdiobus_register
dsa_register_switch
lan9303_probe
lan9303_mdio_probe
...
Cc: stable@vger.kernel.org
Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
woodsts
pushed a commit
that referenced
this pull request
Nov 13, 2023
…pf_iter_reg' Chuyi Zhou says: ==================== The patchset aims to let the BPF verivier consider bpf_iter__cgroup->cgroup and bpf_iter__task->task is trusted suggested by Alexei[1]. Please see individual patches for more details. And comments are always welcome. Link[1]:https://lore.kernel.org/bpf/20231022154527.229117-1-zhouchuyi@bytedance.com/T/#mb57725edc8ccdd50a1b165765c7619b4d65ed1b0 v2->v1: * Patch #1: Add Yonghong's ack and add description of similar case in log. * Patch #2: Add Yonghong's ack ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 13, 2023
We must check the return value of find_first_bit() before using the return value as an index array since it happens to overflow the array and then panic: [ 107.318430] Kernel BUG [#1] [ 107.319434] CPU: 3 PID: 1238 Comm: kill Tainted: G E 6.6.0-rc6ubuntu-defconfig #2 [ 107.319465] Hardware name: riscv-virtio,qemu (DT) [ 107.319551] epc : pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.319840] ra : pmu_sbi_ovf_handler+0x52/0x3ae [ 107.319868] epc : ffffffff80a0a77c ra : ffffffff80a0a42a sp : ffffaf83fecda350 [ 107.319884] gp : ffffffff823961a8 tp : ffffaf8083db1dc0 t0 : ffffaf83fecda480 [ 107.319899] t1 : ffffffff80cafe62 t2 : 000000000000ff00 s0 : ffffaf83fecda520 [ 107.319921] s1 : ffffaf83fecda380 a0 : 00000018fca29df0 a1 : ffffffffffffffff [ 107.319936] a2 : 0000000001073734 a3 : 0000000000000004 a4 : 0000000000000000 [ 107.319951] a5 : 0000000000000040 a6 : 000000001d1c8774 a7 : 0000000000504d55 [ 107.319965] s2 : ffffffff82451f10 s3 : ffffffff82724e70 s4 : 000000000000003f [ 107.319980] s5 : 0000000000000011 s6 : ffffaf8083db27c0 s7 : 0000000000000000 [ 107.319995] s8 : 0000000000000001 s9 : 00007fffb45d6558 s10: 00007fffb45d81a0 [ 107.320009] s11: ffffaf7ffff60000 t3 : 0000000000000004 t4 : 0000000000000000 [ 107.320023] t5 : ffffaf7f80000000 t6 : ffffaf8000000000 [ 107.320037] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 107.320081] [<ffffffff80a0a77c>] pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.320112] [<ffffffff800b42d0>] handle_percpu_devid_irq+0x9e/0x1a0 [ 107.320131] [<ffffffff800ad92c>] generic_handle_domain_irq+0x28/0x36 [ 107.320148] [<ffffffff8065f9f8>] riscv_intc_irq+0x36/0x4e [ 107.320166] [<ffffffff80caf4a0>] handle_riscv_irq+0x54/0x86 [ 107.320189] [<ffffffff80cb0036>] do_irq+0x64/0x96 [ 107.320271] Code: 85a6 855e b097 ff7f 80e7 9220 b709 9002 4501 bbd9 (9002) 6097 [ 107.320585] ---[ end trace 0000000000000000 ]--- [ 107.320704] Kernel panic - not syncing: Fatal exception in interrupt [ 107.320775] SMP: stopping secondary CPUs [ 107.321219] Kernel Offset: 0x0 from 0xffffffff80000000 [ 107.333051] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 4905ec2 ("RISC-V: Add sscofpmf extension support") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231109082128.40777-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
This allows it to break the following circular locking dependency.
Aug 10 07:01:29 dg1test kernel: ======================================================
Aug 10 07:01:29 dg1test kernel: WARNING: possible circular locking dependency detected
Aug 10 07:01:29 dg1test kernel: 6.4.0-rc7+ #10 Not tainted
Aug 10 07:01:29 dg1test kernel: ------------------------------------------------------
Aug 10 07:01:29 dg1test kernel: wireplumber/2236 is trying to acquire lock:
Aug 10 07:01:29 dg1test kernel: ffff8fca5320da18 (&fctx->lock){-...}-{2:2}, at: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:
but task is already holding lock:
Aug 10 07:01:29 dg1test kernel: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
which lock already depends on the new lock.
Aug 10 07:01:29 dg1test kernel:
the existing dependency chain (in reverse order) is:
Aug 10 07:01:29 dg1test kernel:
-> #3 (&event->list_lock#2){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
-> #2 (&device->intr.lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nvkm_inth_allow+0x2c/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_state+0x181/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_fence_enable_signaling+0x78/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel: __dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel: dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel: nouveau_cli_work_queue+0xae/0x110 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_gem_object_close+0x1d1/0x2a0 [nouveau]
Aug 10 07:01:29 dg1test kernel: drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel: __x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel: do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
-> #1 (&event->refs_lock#4){....}-{2:2}:
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_state+0x37/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_fence_enable_signaling+0x78/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel: __dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel: dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel: nouveau_cli_work_queue+0xae/0x110 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_gem_object_close+0x1d1/0x2a0 [nouveau]
Aug 10 07:01:29 dg1test kernel: drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel: __x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel: do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
-> #0 (&fctx->lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel: __lock_acquire+0x14e3/0x2240
Aug 10 07:01:29 dg1test kernel: lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_client_event+0xf/0x20 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x9b/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
other info that might help us debug this:
Aug 10 07:01:29 dg1test kernel: Chain exists of:
&fctx->lock --> &device->intr.lock --> &event->list_lock#2
Aug 10 07:01:29 dg1test kernel: Possible unsafe locking scenario:
Aug 10 07:01:29 dg1test kernel: CPU0 CPU1
Aug 10 07:01:29 dg1test kernel: ---- ----
Aug 10 07:01:29 dg1test kernel: lock(&event->list_lock#2);
Aug 10 07:01:29 dg1test kernel: lock(&device->intr.lock);
Aug 10 07:01:29 dg1test kernel: lock(&event->list_lock#2);
Aug 10 07:01:29 dg1test kernel: lock(&fctx->lock);
Aug 10 07:01:29 dg1test kernel:
*** DEADLOCK ***
Aug 10 07:01:29 dg1test kernel: 2 locks held by wireplumber/2236:
Aug 10 07:01:29 dg1test kernel: #0: ffff8fca53177bf8 (&device->intr.lock){-...}-{2:2}, at: nvkm_intr+0x29/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: #1: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
stack backtrace:
Aug 10 07:01:29 dg1test kernel: CPU: 6 PID: 2236 Comm: wireplumber Not tainted 6.4.0-rc7+ #10
Aug 10 07:01:29 dg1test kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
Aug 10 07:01:29 dg1test kernel: Call Trace:
Aug 10 07:01:29 dg1test kernel: <TASK>
Aug 10 07:01:29 dg1test kernel: dump_stack_lvl+0x5b/0x90
Aug 10 07:01:29 dg1test kernel: check_noncircular+0xe2/0x110
Aug 10 07:01:29 dg1test kernel: __lock_acquire+0x14e3/0x2240
Aug 10 07:01:29 dg1test kernel: lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel: ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: ? lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_client_event+0xf/0x20 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x9b/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel: RIP: 0033:0x7fb66174d700
Aug 10 07:01:29 dg1test kernel: Code: c1 e2 05 29 ca 8d 0c 10 0f be 07 84 c0 75 eb 89 c8 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa e9 d7 0f fc ff 0f 1f 80 00 00 00 00 <f3> 0f 1e fa e9 c7 0f fc>
Aug 10 07:01:29 dg1test kernel: RSP: 002b:00007ffdd3c48438 EFLAGS: 00000206
Aug 10 07:01:29 dg1test kernel: RAX: 000055bb758763c0 RBX: 000055bb758752c0 RCX: 00000000000028b0
Aug 10 07:01:29 dg1test kernel: RDX: 000055bb758752c0 RSI: 000055bb75887490 RDI: 000055bb75862950
Aug 10 07:01:29 dg1test kernel: RBP: 00007ffdd3c48490 R08: 000055bb75873b10 R09: 0000000000000001
Aug 10 07:01:29 dg1test kernel: R10: 0000000000000004 R11: 000055bb7587f000 R12: 000055bb75887490
Aug 10 07:01:29 dg1test kernel: R13: 000055bb757f6280 R14: 000055bb758875c0 R15: 000055bb757f6280
Aug 10 07:01:29 dg1test kernel: </TASK>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231107053255.2257079-1-airlied@gmail.com
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit 4428399 ] The lt8912b driver, in its bridge detach function, calls drm_connector_unregister() and drm_connector_cleanup(). drm_connector_unregister() should be called only for connectors explicitly registered with drm_connector_register(), which is not the case in lt8912b. The driver's drm_connector_funcs.destroy hook is set to drm_connector_cleanup(). Thus the driver should not call either drm_connector_unregister() nor drm_connector_cleanup() in its lt8912_bridge_detach(), as they cause a crash on bridge detach: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000858f3000 [0000000000000000] pgd=0800000085918003, p4d=0800000085918003, pud=0800000085431003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: tidss(-) display_connector lontium_lt8912b tc358768 panel_lvds panel_simple drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks CPU: 3 PID: 462 Comm: rmmod Tainted: G W 6.5.0-rc2+ #2 Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drm_connector_cleanup+0x78/0x2d4 [drm] lr : lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] sp : ffff800082ed3a90 x29: ffff800082ed3a90 x28: ffff0000040c1940 x27: 0000000000000000 x26: 0000000000000000 x25: dead000000000122 x24: dead000000000122 x23: dead000000000100 x22: ffff000003fb6388 x21: 0000000000000000 x20: 0000000000000000 x19: ffff000003fb6260 x18: fffffffffffe56e8 x17: 0000000000000000 x16: 0010000000000000 x15: 0000000000000038 x14: 0000000000000000 x13: ffff800081914b48 x12: 000000000000040e x11: 000000000000015a x10: ffff80008196ebb8 x9 : ffff800081914b48 x8 : 00000000ffffefff x7 : ffff0000040c1940 x6 : ffff80007aa649d0 x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff80008159e008 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: drm_connector_cleanup+0x78/0x2d4 [drm] lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] drm_bridge_detach+0x44/0x84 [drm] drm_encoder_cleanup+0x40/0xb8 [drm] drmm_encoder_alloc_release+0x1c/0x30 [drm] drm_managed_release+0xac/0x148 [drm] drm_dev_put.part.0+0x88/0xb8 [drm] devm_drm_dev_init_release+0x14/0x24 [drm] devm_action_release+0x14/0x20 release_nodes+0x5c/0x90 devres_release_all+0x8c/0xe0 device_unbind_cleanup+0x18/0x68 device_release_driver_internal+0x208/0x23c driver_detach+0x4c/0x94 bus_remove_driver+0x70/0xf4 driver_unregister+0x30/0x60 platform_driver_unregister+0x14/0x20 tidss_platform_driver_exit+0x18/0xb2c [tidss] __arm64_sys_delete_module+0x1a0/0x2b4 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x60/0x10c do_el0_svc_compat+0x1c/0x40 el0_svc_compat+0x40/0xac el0t_32_sync_handler+0xb0/0x138 el0t_32_sync+0x194/0x198 Code: 9104a276 f2fbd5b7 aa0203e1 91008af8 (f85c0420) Fixes: 30e2ae9 ("drm/bridge: Introduce LT8912B DSI to HDMI bridge") Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230804-lt8912b-v1-2-c542692c6a2f@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit 4428399 ] The lt8912b driver, in its bridge detach function, calls drm_connector_unregister() and drm_connector_cleanup(). drm_connector_unregister() should be called only for connectors explicitly registered with drm_connector_register(), which is not the case in lt8912b. The driver's drm_connector_funcs.destroy hook is set to drm_connector_cleanup(). Thus the driver should not call either drm_connector_unregister() nor drm_connector_cleanup() in its lt8912_bridge_detach(), as they cause a crash on bridge detach: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000858f3000 [0000000000000000] pgd=0800000085918003, p4d=0800000085918003, pud=0800000085431003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: tidss(-) display_connector lontium_lt8912b tc358768 panel_lvds panel_simple drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks CPU: 3 PID: 462 Comm: rmmod Tainted: G W 6.5.0-rc2+ #2 Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drm_connector_cleanup+0x78/0x2d4 [drm] lr : lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] sp : ffff800082ed3a90 x29: ffff800082ed3a90 x28: ffff0000040c1940 x27: 0000000000000000 x26: 0000000000000000 x25: dead000000000122 x24: dead000000000122 x23: dead000000000100 x22: ffff000003fb6388 x21: 0000000000000000 x20: 0000000000000000 x19: ffff000003fb6260 x18: fffffffffffe56e8 x17: 0000000000000000 x16: 0010000000000000 x15: 0000000000000038 x14: 0000000000000000 x13: ffff800081914b48 x12: 000000000000040e x11: 000000000000015a x10: ffff80008196ebb8 x9 : ffff800081914b48 x8 : 00000000ffffefff x7 : ffff0000040c1940 x6 : ffff80007aa649d0 x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff80008159e008 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: drm_connector_cleanup+0x78/0x2d4 [drm] lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] drm_bridge_detach+0x44/0x84 [drm] drm_encoder_cleanup+0x40/0xb8 [drm] drmm_encoder_alloc_release+0x1c/0x30 [drm] drm_managed_release+0xac/0x148 [drm] drm_dev_put.part.0+0x88/0xb8 [drm] devm_drm_dev_init_release+0x14/0x24 [drm] devm_action_release+0x14/0x20 release_nodes+0x5c/0x90 devres_release_all+0x8c/0xe0 device_unbind_cleanup+0x18/0x68 device_release_driver_internal+0x208/0x23c driver_detach+0x4c/0x94 bus_remove_driver+0x70/0xf4 driver_unregister+0x30/0x60 platform_driver_unregister+0x14/0x20 tidss_platform_driver_exit+0x18/0xb2c [tidss] __arm64_sys_delete_module+0x1a0/0x2b4 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x60/0x10c do_el0_svc_compat+0x1c/0x40 el0_svc_compat+0x40/0xac el0t_32_sync_handler+0xb0/0x138 el0t_32_sync+0x194/0x198 Code: 9104a276 f2fbd5b7 aa0203e1 91008af8 (f85c0420) Fixes: 30e2ae9 ("drm/bridge: Introduce LT8912B DSI to HDMI bridge") Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230804-lt8912b-v1-2-c542692c6a2f@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit a84fbf2 ] Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX. ``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 #9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 #10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 #11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 #12 0x558265bb9389 in main tools/perf/perf.c:537:3 ``` The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed. Fixes: 8a96f45 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit ede72dc ] Fuzzing found that an invalid tracepoint name would create a memory leak with an address sanitizer build: ``` $ perf stat -e '*:o/' true event syntax error: '*:o/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ================================================================= ==59380==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4 byte(s) in 2 object(s) allocated from: #0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x55f2f41be73b in str util/parse-events.l:49 #2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338 #3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464 #4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822 #5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094 #6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279 #7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251 #8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351 #9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539 #10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654 #11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501 #12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322 #13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375 #14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419 #15 0x55f2f409412b in main tools/perf/perf.c:535 SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s). ``` Fix by adding the missing destructor. Fixes: 865582c ("perf tools: Adds the tracepoint name parsing support") Signed-off-by: Ian Rogers <irogers@google.com> Cc: He Kuang <hekuang@huawei.com> Link: https://lore.kernel.org/r/20230914164028.363220-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit 4428399 ] The lt8912b driver, in its bridge detach function, calls drm_connector_unregister() and drm_connector_cleanup(). drm_connector_unregister() should be called only for connectors explicitly registered with drm_connector_register(), which is not the case in lt8912b. The driver's drm_connector_funcs.destroy hook is set to drm_connector_cleanup(). Thus the driver should not call either drm_connector_unregister() nor drm_connector_cleanup() in its lt8912_bridge_detach(), as they cause a crash on bridge detach: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000858f3000 [0000000000000000] pgd=0800000085918003, p4d=0800000085918003, pud=0800000085431003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: tidss(-) display_connector lontium_lt8912b tc358768 panel_lvds panel_simple drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks CPU: 3 PID: 462 Comm: rmmod Tainted: G W 6.5.0-rc2+ #2 Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drm_connector_cleanup+0x78/0x2d4 [drm] lr : lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] sp : ffff800082ed3a90 x29: ffff800082ed3a90 x28: ffff0000040c1940 x27: 0000000000000000 x26: 0000000000000000 x25: dead000000000122 x24: dead000000000122 x23: dead000000000100 x22: ffff000003fb6388 x21: 0000000000000000 x20: 0000000000000000 x19: ffff000003fb6260 x18: fffffffffffe56e8 x17: 0000000000000000 x16: 0010000000000000 x15: 0000000000000038 x14: 0000000000000000 x13: ffff800081914b48 x12: 000000000000040e x11: 000000000000015a x10: ffff80008196ebb8 x9 : ffff800081914b48 x8 : 00000000ffffefff x7 : ffff0000040c1940 x6 : ffff80007aa649d0 x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff80008159e008 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: drm_connector_cleanup+0x78/0x2d4 [drm] lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] drm_bridge_detach+0x44/0x84 [drm] drm_encoder_cleanup+0x40/0xb8 [drm] drmm_encoder_alloc_release+0x1c/0x30 [drm] drm_managed_release+0xac/0x148 [drm] drm_dev_put.part.0+0x88/0xb8 [drm] devm_drm_dev_init_release+0x14/0x24 [drm] devm_action_release+0x14/0x20 release_nodes+0x5c/0x90 devres_release_all+0x8c/0xe0 device_unbind_cleanup+0x18/0x68 device_release_driver_internal+0x208/0x23c driver_detach+0x4c/0x94 bus_remove_driver+0x70/0xf4 driver_unregister+0x30/0x60 platform_driver_unregister+0x14/0x20 tidss_platform_driver_exit+0x18/0xb2c [tidss] __arm64_sys_delete_module+0x1a0/0x2b4 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x60/0x10c do_el0_svc_compat+0x1c/0x40 el0_svc_compat+0x40/0xac el0t_32_sync_handler+0xb0/0x138 el0t_32_sync+0x194/0x198 Code: 9104a276 f2fbd5b7 aa0203e1 91008af8 (f85c0420) Fixes: 30e2ae9 ("drm/bridge: Introduce LT8912B DSI to HDMI bridge") Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230804-lt8912b-v1-2-c542692c6a2f@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit a84fbf2 ] Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX. ``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 #9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 #10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 #11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 #12 0x558265bb9389 in main tools/perf/perf.c:537:3 ``` The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed. Fixes: 8a96f45 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit 4428399 ] The lt8912b driver, in its bridge detach function, calls drm_connector_unregister() and drm_connector_cleanup(). drm_connector_unregister() should be called only for connectors explicitly registered with drm_connector_register(), which is not the case in lt8912b. The driver's drm_connector_funcs.destroy hook is set to drm_connector_cleanup(). Thus the driver should not call either drm_connector_unregister() nor drm_connector_cleanup() in its lt8912_bridge_detach(), as they cause a crash on bridge detach: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000858f3000 [0000000000000000] pgd=0800000085918003, p4d=0800000085918003, pud=0800000085431003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: tidss(-) display_connector lontium_lt8912b tc358768 panel_lvds panel_simple drm_dma_helper drm_kms_helper drm drm_panel_orientation_quirks CPU: 3 PID: 462 Comm: rmmod Tainted: G W 6.5.0-rc2+ #2 Hardware name: Toradex Verdin AM62 on Verdin Development Board (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : drm_connector_cleanup+0x78/0x2d4 [drm] lr : lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] sp : ffff800082ed3a90 x29: ffff800082ed3a90 x28: ffff0000040c1940 x27: 0000000000000000 x26: 0000000000000000 x25: dead000000000122 x24: dead000000000122 x23: dead000000000100 x22: ffff000003fb6388 x21: 0000000000000000 x20: 0000000000000000 x19: ffff000003fb6260 x18: fffffffffffe56e8 x17: 0000000000000000 x16: 0010000000000000 x15: 0000000000000038 x14: 0000000000000000 x13: ffff800081914b48 x12: 000000000000040e x11: 000000000000015a x10: ffff80008196ebb8 x9 : ffff800081914b48 x8 : 00000000ffffefff x7 : ffff0000040c1940 x6 : ffff80007aa649d0 x5 : 0000000000000000 x4 : 0000000000000001 x3 : ffff80008159e008 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: drm_connector_cleanup+0x78/0x2d4 [drm] lt8912_bridge_detach+0x54/0x6c [lontium_lt8912b] drm_bridge_detach+0x44/0x84 [drm] drm_encoder_cleanup+0x40/0xb8 [drm] drmm_encoder_alloc_release+0x1c/0x30 [drm] drm_managed_release+0xac/0x148 [drm] drm_dev_put.part.0+0x88/0xb8 [drm] devm_drm_dev_init_release+0x14/0x24 [drm] devm_action_release+0x14/0x20 release_nodes+0x5c/0x90 devres_release_all+0x8c/0xe0 device_unbind_cleanup+0x18/0x68 device_release_driver_internal+0x208/0x23c driver_detach+0x4c/0x94 bus_remove_driver+0x70/0xf4 driver_unregister+0x30/0x60 platform_driver_unregister+0x14/0x20 tidss_platform_driver_exit+0x18/0xb2c [tidss] __arm64_sys_delete_module+0x1a0/0x2b4 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x60/0x10c do_el0_svc_compat+0x1c/0x40 el0_svc_compat+0x40/0xac el0t_32_sync_handler+0xb0/0x138 el0t_32_sync+0x194/0x198 Code: 9104a276 f2fbd5b7 aa0203e1 91008af8 (f85c0420) Fixes: 30e2ae9 ("drm/bridge: Introduce LT8912B DSI to HDMI bridge") Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230804-lt8912b-v1-2-c542692c6a2f@ideasonboard.com Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit a84fbf2 ] Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX. ``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 #9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 #10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 #11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 #12 0x558265bb9389 in main tools/perf/perf.c:537:3 ``` The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed. Fixes: 8a96f45 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 20, 2023
[ Upstream commit ede72dc ] Fuzzing found that an invalid tracepoint name would create a memory leak with an address sanitizer build: ``` $ perf stat -e '*:o/' true event syntax error: '*:o/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events ================================================================= ==59380==ERROR: LeakSanitizer: detected memory leaks Direct leak of 4 byte(s) in 2 object(s) allocated from: #0 0x7f38ac07077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 #1 0x55f2f41be73b in str util/parse-events.l:49 #2 0x55f2f41d08e8 in parse_events_lex util/parse-events.l:338 #3 0x55f2f41dc3b1 in parse_events_parse util/parse-events-bison.c:1464 #4 0x55f2f410b8b3 in parse_events__scanner util/parse-events.c:1822 #5 0x55f2f410d1b9 in __parse_events util/parse-events.c:2094 #6 0x55f2f410e57f in parse_events_option util/parse-events.c:2279 #7 0x55f2f4427b56 in get_value tools/lib/subcmd/parse-options.c:251 #8 0x55f2f4428d98 in parse_short_opt tools/lib/subcmd/parse-options.c:351 #9 0x55f2f4429d80 in parse_options_step tools/lib/subcmd/parse-options.c:539 #10 0x55f2f442acb9 in parse_options_subcommand tools/lib/subcmd/parse-options.c:654 #11 0x55f2f3ec99fc in cmd_stat tools/perf/builtin-stat.c:2501 #12 0x55f2f4093289 in run_builtin tools/perf/perf.c:322 #13 0x55f2f40937f5 in handle_internal_command tools/perf/perf.c:375 #14 0x55f2f4093bbd in run_argv tools/perf/perf.c:419 #15 0x55f2f409412b in main tools/perf/perf.c:535 SUMMARY: AddressSanitizer: 4 byte(s) leaked in 2 allocation(s). ``` Fix by adding the missing destructor. Fixes: 865582c ("perf tools: Adds the tracepoint name parsing support") Signed-off-by: Ian Rogers <irogers@google.com> Cc: He Kuang <hekuang@huawei.com> Link: https://lore.kernel.org/r/20230914164028.363220-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
[ Upstream commit 265f3ed ] All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
[ Upstream commit 265f3ed ] All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
[ Upstream commit 265f3ed ] All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit c6e316a upstream. We must check the return value of find_first_bit() before using the return value as an index array since it happens to overflow the array and then panic: [ 107.318430] Kernel BUG [#1] [ 107.319434] CPU: 3 PID: 1238 Comm: kill Tainted: G E 6.6.0-rc6ubuntu-defconfig #2 [ 107.319465] Hardware name: riscv-virtio,qemu (DT) [ 107.319551] epc : pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.319840] ra : pmu_sbi_ovf_handler+0x52/0x3ae [ 107.319868] epc : ffffffff80a0a77c ra : ffffffff80a0a42a sp : ffffaf83fecda350 [ 107.319884] gp : ffffffff823961a8 tp : ffffaf8083db1dc0 t0 : ffffaf83fecda480 [ 107.319899] t1 : ffffffff80cafe62 t2 : 000000000000ff00 s0 : ffffaf83fecda520 [ 107.319921] s1 : ffffaf83fecda380 a0 : 00000018fca29df0 a1 : ffffffffffffffff [ 107.319936] a2 : 0000000001073734 a3 : 0000000000000004 a4 : 0000000000000000 [ 107.319951] a5 : 0000000000000040 a6 : 000000001d1c8774 a7 : 0000000000504d55 [ 107.319965] s2 : ffffffff82451f10 s3 : ffffffff82724e70 s4 : 000000000000003f [ 107.319980] s5 : 0000000000000011 s6 : ffffaf8083db27c0 s7 : 0000000000000000 [ 107.319995] s8 : 0000000000000001 s9 : 00007fffb45d6558 s10: 00007fffb45d81a0 [ 107.320009] s11: ffffaf7ffff60000 t3 : 0000000000000004 t4 : 0000000000000000 [ 107.320023] t5 : ffffaf7f80000000 t6 : ffffaf8000000000 [ 107.320037] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 107.320081] [<ffffffff80a0a77c>] pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.320112] [<ffffffff800b42d0>] handle_percpu_devid_irq+0x9e/0x1a0 [ 107.320131] [<ffffffff800ad92c>] generic_handle_domain_irq+0x28/0x36 [ 107.320148] [<ffffffff8065f9f8>] riscv_intc_irq+0x36/0x4e [ 107.320166] [<ffffffff80caf4a0>] handle_riscv_irq+0x54/0x86 [ 107.320189] [<ffffffff80cb0036>] do_irq+0x64/0x96 [ 107.320271] Code: 85a6 855e b097 ff7f 80e7 9220 b709 9002 4501 bbd9 (9002) 6097 [ 107.320585] ---[ end trace 0000000000000000 ]--- [ 107.320704] Kernel panic - not syncing: Fatal exception in interrupt [ 107.320775] SMP: stopping secondary CPUs [ 107.321219] Kernel Offset: 0x0 from 0xffffffff80000000 [ 107.333051] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 4905ec2 ("RISC-V: Add sscofpmf extension support") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231109082128.40777-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 807252f upstream. Running smb2.rename test from Samba smbtorture suite against a kernel built with lockdep triggers a "possible recursive locking detected" warning. This is because mnt_want_write() is called twice with no mnt_drop_write() in between: -> ksmbd_vfs_mkdir() -> ksmbd_vfs_kern_path_create() -> kern_path_create() -> filename_create() -> mnt_want_write() -> mnt_want_write() Fix this by removing the mnt_want_write/mnt_drop_write calls from vfs helpers that call kern_path_create(). Full lockdep trace below: ============================================ WARNING: possible recursive locking detected 6.6.0-rc5 #775 Not tainted -------------------------------------------- kworker/1:1/32 is trying to acquire lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: ksmbd_vfs_mkdir+0xe1/0x410 but task is already holding lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_writers#5); lock(sb_writers#5); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/1:1/32: #0: ffff8880064e4138 ((wq_completion)ksmbd-io){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #1: ffff888005b0fdd0 ((work_completion)(&work->work)){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #2: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 #3: ffff8880057ce760 (&type->i_mutex_dir_key#3/1){+.+.}-{3:3}, at: filename_create+0x123/0x260 Cc: stable@vger.kernel.org Fixes: 40b268d ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit 5a22fbc upstream. When LAN9303 is MDIO-connected two callchains exist into mdio->bus->write(): 1. switch ports 1&2 ("physical" PHYs): virtual (switch-internal) MDIO bus (lan9303_switch_ops->phy_{read|write})-> lan9303_mdio_phy_{read|write} -> mdiobus_{read|write}_nested 2. LAN9303 virtual PHY: virtual MDIO bus (lan9303_phy_{read|write}) -> lan9303_virt_phy_reg_{read|write} -> regmap -> lan9303_mdio_{read|write} If the latter functions just take mutex_lock(&sw_dev->device->bus->mdio_lock) it triggers a LOCKDEP false-positive splat. It's false-positive because the first mdio_lock in the second callchain above belongs to virtual MDIO bus, the second mdio_lock belongs to physical MDIO bus. Consequent annotation in lan9303_mdio_{read|write} as nested lock (similar to lan9303_mdio_phy_{read|write}, it's the same physical MDIO bus) prevents the following splat: WARNING: possible circular locking dependency detected 5.15.71 #1 Not tainted ------------------------------------------------------ kworker/u4:3/609 is trying to acquire lock: ffff000011531c68 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}, at: regmap_lock_mutex but task is already holding lock: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&bus->mdio_lock){+.+.}-{3:3}: lock_acquire __mutex_lock mutex_lock_nested lan9303_mdio_read _regmap_read regmap_read lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork -> #0 (lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock){+.+.}-{3:3}: __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe mdio_probe really_probe __driver_probe_device driver_probe_device __device_attach_driver bus_for_each_drv __device_attach device_initial_probe bus_probe_device deferred_probe_work_func process_one_work worker_thread kthread ret_from_fork other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); lock(&bus->mdio_lock); lock(lan9303_mdio:131:(&lan9303_mdio_regmap_config)->lock); *** DEADLOCK *** 5 locks held by kworker/u4:3/609: #0: ffff000002842938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work #1: ffff80000bacbd60 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work #2: ffff000007645178 (&dev->mutex){....}-{3:3}, at: __device_attach #3: ffff8000096e6e78 (dsa2_mutex){+.+.}-{3:3}, at: dsa_register_switch #4: ffff0000114c44d8 (&bus->mdio_lock){+.+.}-{3:3}, at: mdiobus_read stack backtrace: CPU: 1 PID: 609 Comm: kworker/u4:3 Not tainted 5.15.71 #1 Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace show_stack dump_stack_lvl dump_stack print_circular_bug check_noncircular __lock_acquire lock_acquire.part.0 lock_acquire __mutex_lock mutex_lock_nested regmap_lock_mutex regmap_read lan9303_phy_read dsa_slave_phy_read __mdiobus_read mdiobus_read get_phy_device mdiobus_scan __mdiobus_register dsa_register_switch lan9303_probe lan9303_mdio_probe ... Cc: stable@vger.kernel.org Fixes: dc70058 ("net: dsa: LAN9303: add MDIO managed mode support") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231027065741.534971-1-alexander.sverdlin@siemens.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
[ Upstream commit 265f3ed ] All callers of work_on_cpu() share the same lock class key for all the functions queued. As a result the workqueue related locking scenario for a function A may be spuriously accounted as an inversion against the locking scenario of function B such as in the following model: long A(void *arg) { mutex_lock(&mutex); mutex_unlock(&mutex); } long B(void *arg) { } void launchA(void) { work_on_cpu(0, A, NULL); } void launchB(void) { mutex_lock(&mutex); work_on_cpu(1, B, NULL); mutex_unlock(&mutex); } launchA and launchB running concurrently have no chance to deadlock. However the above can be reported by lockdep as a possible locking inversion because the works containing A() and B() are treated as belonging to the same locking class. The following shows an existing example of such a spurious lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.6.0-rc1-00065-g934ebd6e5359 #35409 Not tainted ------------------------------------------------------ kworker/0:1/9 is trying to acquire lock: ffffffff9bc72f30 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x57/0x2b0 but task is already holding lock: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((work_completion)(&wfc.work)){+.+.}-{0:0}: __flush_work+0x83/0x4e0 work_on_cpu+0x97/0xc0 rcu_nocb_cpu_offload+0x62/0xb0 rcu_nocb_toggle+0xd0/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #1 (rcu_state.barrier_mutex){+.+.}-{3:3}: __mutex_lock+0x81/0xc80 rcu_nocb_cpu_deoffload+0x38/0xb0 rcu_nocb_toggle+0x144/0x1d0 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 percpu_down_write+0x31/0x200 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 kthread+0xe6/0x120 ret_from_fork+0x2f/0x40 ret_from_fork_asm+0x1b/0x30 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> rcu_state.barrier_mutex --> (work_completion)(&wfc.work) Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&wfc.work)); lock(rcu_state.barrier_mutex); lock((work_completion)(&wfc.work)); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by kworker/0:1/9: #0: ffff900481068b38 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x212/0x500 #1: ffff9e3bc0057e60 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_scheduled_works+0x216/0x500 stack backtrace: CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.0-rc1-00065-g934ebd6e5359 #35409 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Workqueue: events work_for_cpu_fn Call Trace: rcu-torture: rcu_torture_read_exit: Start of episode <TASK> dump_stack_lvl+0x4a/0x80 check_noncircular+0x132/0x150 __lock_acquire+0x1538/0x2500 lock_acquire+0xbf/0x2a0 ? _cpu_down+0x57/0x2b0 percpu_down_write+0x31/0x200 ? _cpu_down+0x57/0x2b0 _cpu_down+0x57/0x2b0 __cpu_down_maps_locked+0x10/0x20 work_for_cpu_fn+0x15/0x20 process_scheduled_works+0x2a7/0x500 worker_thread+0x173/0x330 ? __pfx_worker_thread+0x10/0x10 kthread+0xe6/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK Fix this with providing one lock class key per work_on_cpu() caller. Reported-and-tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Nov 28, 2023
commit c6e316a upstream. We must check the return value of find_first_bit() before using the return value as an index array since it happens to overflow the array and then panic: [ 107.318430] Kernel BUG [#1] [ 107.319434] CPU: 3 PID: 1238 Comm: kill Tainted: G E 6.6.0-rc6ubuntu-defconfig #2 [ 107.319465] Hardware name: riscv-virtio,qemu (DT) [ 107.319551] epc : pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.319840] ra : pmu_sbi_ovf_handler+0x52/0x3ae [ 107.319868] epc : ffffffff80a0a77c ra : ffffffff80a0a42a sp : ffffaf83fecda350 [ 107.319884] gp : ffffffff823961a8 tp : ffffaf8083db1dc0 t0 : ffffaf83fecda480 [ 107.319899] t1 : ffffffff80cafe62 t2 : 000000000000ff00 s0 : ffffaf83fecda520 [ 107.319921] s1 : ffffaf83fecda380 a0 : 00000018fca29df0 a1 : ffffffffffffffff [ 107.319936] a2 : 0000000001073734 a3 : 0000000000000004 a4 : 0000000000000000 [ 107.319951] a5 : 0000000000000040 a6 : 000000001d1c8774 a7 : 0000000000504d55 [ 107.319965] s2 : ffffffff82451f10 s3 : ffffffff82724e70 s4 : 000000000000003f [ 107.319980] s5 : 0000000000000011 s6 : ffffaf8083db27c0 s7 : 0000000000000000 [ 107.319995] s8 : 0000000000000001 s9 : 00007fffb45d6558 s10: 00007fffb45d81a0 [ 107.320009] s11: ffffaf7ffff60000 t3 : 0000000000000004 t4 : 0000000000000000 [ 107.320023] t5 : ffffaf7f80000000 t6 : ffffaf8000000000 [ 107.320037] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 107.320081] [<ffffffff80a0a77c>] pmu_sbi_ovf_handler+0x3a4/0x3ae [ 107.320112] [<ffffffff800b42d0>] handle_percpu_devid_irq+0x9e/0x1a0 [ 107.320131] [<ffffffff800ad92c>] generic_handle_domain_irq+0x28/0x36 [ 107.320148] [<ffffffff8065f9f8>] riscv_intc_irq+0x36/0x4e [ 107.320166] [<ffffffff80caf4a0>] handle_riscv_irq+0x54/0x86 [ 107.320189] [<ffffffff80cb0036>] do_irq+0x64/0x96 [ 107.320271] Code: 85a6 855e b097 ff7f 80e7 9220 b709 9002 4501 bbd9 (9002) 6097 [ 107.320585] ---[ end trace 0000000000000000 ]--- [ 107.320704] Kernel panic - not syncing: Fatal exception in interrupt [ 107.320775] SMP: stopping secondary CPUs [ 107.321219] Kernel Offset: 0x0 from 0xffffffff80000000 [ 107.333051] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 4905ec2 ("RISC-V: Add sscofpmf extension support") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231109082128.40777-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 13, 2023
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 13, 2023
[ Upstream commit a2e36cd ] This allows it to break the following circular locking dependency. Aug 10 07:01:29 dg1test kernel: ====================================================== Aug 10 07:01:29 dg1test kernel: WARNING: possible circular locking dependency detected Aug 10 07:01:29 dg1test kernel: 6.4.0-rc7+ #10 Not tainted Aug 10 07:01:29 dg1test kernel: ------------------------------------------------------ Aug 10 07:01:29 dg1test kernel: wireplumber/2236 is trying to acquire lock: Aug 10 07:01:29 dg1test kernel: ffff8fca5320da18 (&fctx->lock){-...}-{2:2}, at: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau] Aug 10 07:01:29 dg1test kernel: but task is already holding lock: Aug 10 07:01:29 dg1test kernel: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau] Aug 10 07:01:29 dg1test kernel: which lock already depends on the new lock. Aug 10 07:01:29 dg1test kernel: the existing dependency chain (in reverse order) is: Aug 10 07:01:29 dg1test kernel: -> #3 (&event->list_lock#2){-...}-{2:2}: Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70 Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x50/0xf0 [nouveau] Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau] Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240 Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80 Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240 Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160 Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0 Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40 Aug 10 07:01:29 dg1test kernel: -> #2 (&device->intr.lock){-...}-{2:2}: Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70 Aug 10 07:01:29 dg1test kernel: nvkm_inth_allow+0x2c/0x80 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_state+0x181/0x250 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_allow+0x63/0xd0 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_uevent_mthd+0x4d/0x70 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_ioctl+0x10b/0x250 [nouveau] Aug 10 07:01:29 dg1test kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau] Aug 10 07:01:29 dg1test kernel: nvif_event_allow+0x2a/0xa0 [nouveau] Aug 10 07:01:29 dg1test kernel: nouveau_fence_enable_signaling+0x78/0x80 [nouveau] Aug 10 07:01:29 dg1test kernel: __dma_fence_enable_signaling+0x5e/0x100 Aug 10 07:01:29 dg1test kernel: dma_fence_add_callback+0x4b/0xd0 Aug 10 07:01:29 dg1test kernel: nouveau_cli_work_queue+0xae/0x110 [nouveau] Aug 10 07:01:29 dg1test kernel: nouveau_gem_object_close+0x1d1/0x2a0 [nouveau] Aug 10 07:01:29 dg1test kernel: drm_gem_handle_delete+0x70/0xe0 [drm] Aug 10 07:01:29 dg1test kernel: drm_ioctl_kernel+0xa5/0x150 [drm] Aug 10 07:01:29 dg1test kernel: drm_ioctl+0x256/0x490 [drm] Aug 10 07:01:29 dg1test kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau] Aug 10 07:01:29 dg1test kernel: __x64_sys_ioctl+0x91/0xd0 Aug 10 07:01:29 dg1test kernel: do_syscall_64+0x3c/0x90 Aug 10 07:01:29 dg1test kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc Aug 10 07:01:29 dg1test kernel: -> #1 (&event->refs_lock#4){....}-{2:2}: Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70 Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_state+0x37/0x250 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_allow+0x63/0xd0 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_uevent_mthd+0x4d/0x70 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_ioctl+0x10b/0x250 [nouveau] Aug 10 07:01:29 dg1test kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau] Aug 10 07:01:29 dg1test kernel: nvif_event_allow+0x2a/0xa0 [nouveau] Aug 10 07:01:29 dg1test kernel: nouveau_fence_enable_signaling+0x78/0x80 [nouveau] Aug 10 07:01:29 dg1test kernel: __dma_fence_enable_signaling+0x5e/0x100 Aug 10 07:01:29 dg1test kernel: dma_fence_add_callback+0x4b/0xd0 Aug 10 07:01:29 dg1test kernel: nouveau_cli_work_queue+0xae/0x110 [nouveau] Aug 10 07:01:29 dg1test kernel: nouveau_gem_object_close+0x1d1/0x2a0 [nouveau] Aug 10 07:01:29 dg1test kernel: drm_gem_handle_delete+0x70/0xe0 [drm] Aug 10 07:01:29 dg1test kernel: drm_ioctl_kernel+0xa5/0x150 [drm] Aug 10 07:01:29 dg1test kernel: drm_ioctl+0x256/0x490 [drm] Aug 10 07:01:29 dg1test kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau] Aug 10 07:01:29 dg1test kernel: __x64_sys_ioctl+0x91/0xd0 Aug 10 07:01:29 dg1test kernel: do_syscall_64+0x3c/0x90 Aug 10 07:01:29 dg1test kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc Aug 10 07:01:29 dg1test kernel: -> #0 (&fctx->lock){-...}-{2:2}: Aug 10 07:01:29 dg1test kernel: __lock_acquire+0x14e3/0x2240 Aug 10 07:01:29 dg1test kernel: lock_acquire+0xc8/0x2a0 Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70 Aug 10 07:01:29 dg1test kernel: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_client_event+0xf/0x20 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x9b/0xf0 [nouveau] Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau] Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240 Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80 Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240 Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160 Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0 Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40 Aug 10 07:01:29 dg1test kernel: other info that might help us debug this: Aug 10 07:01:29 dg1test kernel: Chain exists of: &fctx->lock --> &device->intr.lock --> &event->list_lock#2 Aug 10 07:01:29 dg1test kernel: Possible unsafe locking scenario: Aug 10 07:01:29 dg1test kernel: CPU0 CPU1 Aug 10 07:01:29 dg1test kernel: ---- ---- Aug 10 07:01:29 dg1test kernel: lock(&event->list_lock#2); Aug 10 07:01:29 dg1test kernel: lock(&device->intr.lock); Aug 10 07:01:29 dg1test kernel: lock(&event->list_lock#2); Aug 10 07:01:29 dg1test kernel: lock(&fctx->lock); Aug 10 07:01:29 dg1test kernel: *** DEADLOCK *** Aug 10 07:01:29 dg1test kernel: 2 locks held by wireplumber/2236: Aug 10 07:01:29 dg1test kernel: #0: ffff8fca53177bf8 (&device->intr.lock){-...}-{2:2}, at: nvkm_intr+0x29/0x240 [nouveau] Aug 10 07:01:29 dg1test kernel: #1: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau] Aug 10 07:01:29 dg1test kernel: stack backtrace: Aug 10 07:01:29 dg1test kernel: CPU: 6 PID: 2236 Comm: wireplumber Not tainted 6.4.0-rc7+ #10 Aug 10 07:01:29 dg1test kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021 Aug 10 07:01:29 dg1test kernel: Call Trace: Aug 10 07:01:29 dg1test kernel: <TASK> Aug 10 07:01:29 dg1test kernel: dump_stack_lvl+0x5b/0x90 Aug 10 07:01:29 dg1test kernel: check_noncircular+0xe2/0x110 Aug 10 07:01:29 dg1test kernel: __lock_acquire+0x14e3/0x2240 Aug 10 07:01:29 dg1test kernel: lock_acquire+0xc8/0x2a0 Aug 10 07:01:29 dg1test kernel: ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau] Aug 10 07:01:29 dg1test kernel: ? lock_acquire+0xc8/0x2a0 Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70 Aug 10 07:01:29 dg1test kernel: ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau] Aug 10 07:01:29 dg1test kernel: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_client_event+0xf/0x20 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x9b/0xf0 [nouveau] Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau] Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau] Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240 Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80 Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240 Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160 Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0 Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40 Aug 10 07:01:29 dg1test kernel: RIP: 0033:0x7fb66174d700 Aug 10 07:01:29 dg1test kernel: Code: c1 e2 05 29 ca 8d 0c 10 0f be 07 84 c0 75 eb 89 c8 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa e9 d7 0f fc ff 0f 1f 80 00 00 00 00 <f3> 0f 1e fa e9 c7 0f fc> Aug 10 07:01:29 dg1test kernel: RSP: 002b:00007ffdd3c48438 EFLAGS: 00000206 Aug 10 07:01:29 dg1test kernel: RAX: 000055bb758763c0 RBX: 000055bb758752c0 RCX: 00000000000028b0 Aug 10 07:01:29 dg1test kernel: RDX: 000055bb758752c0 RSI: 000055bb75887490 RDI: 000055bb75862950 Aug 10 07:01:29 dg1test kernel: RBP: 00007ffdd3c48490 R08: 000055bb75873b10 R09: 0000000000000001 Aug 10 07:01:29 dg1test kernel: R10: 0000000000000004 R11: 000055bb7587f000 R12: 000055bb75887490 Aug 10 07:01:29 dg1test kernel: R13: 000055bb757f6280 R14: 000055bb758875c0 R15: 000055bb757f6280 Aug 10 07:01:29 dg1test kernel: </TASK> Signed-off-by: Dave Airlie <airlied@redhat.com> Tested-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231107053255.2257079-1-airlied@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 13, 2023
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 13, 2023
commit fe2b122 upstream. When working on LED support for r8169 I got the following lockdep warning. Easiest way to prevent this scenario seems to be to take the RTNL lock before the trigger_data lock in set_device_name(). ====================================================== WARNING: possible circular locking dependency detected 6.7.0-rc2-next-20231124+ #2 Not tainted ------------------------------------------------------ bash/383 is trying to acquire lock: ffff888103aa1c68 (&trigger_data->lock){+.+.}-{3:3}, at: netdev_trig_notify+0xec/0x190 [ledtrig_netdev] but task is already holding lock: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}-{3:3}: __mutex_lock+0x9b/0xb50 mutex_lock_nested+0x16/0x20 rtnl_lock+0x12/0x20 set_device_name+0xa9/0x120 [ledtrig_netdev] netdev_trig_activate+0x1a1/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 -> #0 (&trigger_data->lock){+.+.}-{3:3}: __lock_acquire+0x1459/0x25a0 lock_acquire+0xc8/0x2d0 __mutex_lock+0x9b/0xb50 mutex_lock_nested+0x16/0x20 netdev_trig_notify+0xec/0x190 [ledtrig_netdev] call_netdevice_register_net_notifiers+0x5a/0x100 register_netdevice_notifier+0x85/0x120 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(&trigger_data->lock); lock(rtnl_mutex); lock(&trigger_data->lock); *** DEADLOCK *** 8 locks held by bash/383: #0: ffff888103ff33f0 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf0 #1: ffff888103aa1e88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x114/0x210 #2: ffff8881036f1890 (kn->active#82){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x11d/0x210 #3: ffff888108e2c358 (&led_cdev->led_access){+.+.}-{3:3}, at: led_trigger_write+0x30/0x140 #4: ffffffff8cdd9e10 (triggers_list_lock){++++}-{3:3}, at: led_trigger_write+0x75/0x140 #5: ffff888108e2c270 (&led_cdev->trigger_lock){++++}-{3:3}, at: led_trigger_write+0xe3/0x140 #6: ffffffff8cdde3d0 (pernet_ops_rwsem){++++}-{3:3}, at: register_netdevice_notifier+0x1c/0x120 #7: ffffffff8cddf808 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x12/0x20 stack backtrace: CPU: 0 PID: 383 Comm: bash Not tainted 6.7.0-rc2-next-20231124+ #2 Hardware name: Default string Default string/Default string, BIOS ADLN.M6.SODIMM.ZB.CY.015 08/08/2023 Call Trace: <TASK> dump_stack_lvl+0x5c/0xd0 dump_stack+0x10/0x20 print_circular_bug+0x2dd/0x410 check_noncircular+0x131/0x150 __lock_acquire+0x1459/0x25a0 lock_acquire+0xc8/0x2d0 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] __mutex_lock+0x9b/0xb50 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] ? __this_cpu_preempt_check+0x13/0x20 ? netdev_trig_notify+0xec/0x190 [ledtrig_netdev] ? __cancel_work_timer+0x11c/0x1b0 ? __mutex_lock+0x123/0xb50 mutex_lock_nested+0x16/0x20 ? mutex_lock_nested+0x16/0x20 netdev_trig_notify+0xec/0x190 [ledtrig_netdev] call_netdevice_register_net_notifiers+0x5a/0x100 register_netdevice_notifier+0x85/0x120 netdev_trig_activate+0x1d4/0x230 [ledtrig_netdev] led_trigger_set+0x172/0x2c0 ? preempt_count_add+0x49/0xc0 led_trigger_write+0xf1/0x140 sysfs_kf_bin_write+0x5d/0x80 kernfs_fop_write_iter+0x15d/0x210 vfs_write+0x1f0/0x510 ksys_write+0x6c/0xf0 __x64_sys_write+0x14/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6c/0x74 RIP: 0033:0x7f269055d034 Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 35 c3 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 RSP: 002b:00007ffddb7ef748 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f269055d034 RDX: 0000000000000007 RSI: 000055bf5f4af3c0 RDI: 0000000000000001 RBP: 000055bf5f4af3c0 R08: 0000000000000073 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000007 R13: 00007f26906325c0 R14: 00007f269062ff20 R15: 0000000000000000 </TASK> Fixes: d5e0126 ("leds: trigger: netdev: add additional specific link speed mode") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/fb5c8294-2a10-4bf5-8f10-3d2b77d2757e@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 17, 2023
Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below). To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow. While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0 but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2); *** DEADLOCK *** 4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: c90005b ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
woodsts
pushed a commit
that referenced
this pull request
Dec 17, 2023
As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
woodsts
pushed a commit
that referenced
this pull request
Dec 17, 2023
If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset, smb2_get_data_area() sets @len to non-zero but return NULL, so decode_negTokeninit() ends up being called with a NULL @security_blob: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:asn1_ber_decoder+0x173/0xc80 Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00 RSP: 0018:ffffc9000063f950 EFLAGS: 00010202 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000 FS: 00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? __stack_depot_save+0x1e6/0x480 ? exc_page_fault+0x6f/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? asn1_ber_decoder+0x173/0xc80 ? check_object+0x40/0x340 decode_negTokenInit+0x1e/0x30 [cifs] SMB2_negotiate+0xc99/0x17c0 [cifs] ? smb2_negotiate+0x46/0x60 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 smb2_negotiate+0x46/0x60 [cifs] cifs_negotiate_protocol+0xae/0x130 [cifs] cifs_get_smb_ses+0x517/0x1040 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? queue_delayed_work_on+0x5d/0x90 cifs_mount_get_session+0x78/0x200 [cifs] dfs_mount_share+0x13a/0x9f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 ? find_nls+0x16/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7fce52c2ab1e Fix this by setting @len to zero when @off == 0 so callers won't attempt to dereference non-existing data areas. Reported-by: Robert Morris <rtm@csail.mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
woodsts
pushed a commit
that referenced
this pull request
Dec 17, 2023
Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that their sum does not wrap to a number that is smaller than @reparse_buf and we end up with a wild pointer as follows: BUG: unable to handle page fault for address: ffff88809c5cd45f #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs] Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff 49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b> 04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff RSP: 0018:ffffc90000347a90 EFLAGS: 00010212 RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4 RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100 R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000 FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? _raw_spin_unlock_irqrestore+0x44/0x60 ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs] cifs_get_fattr+0x16e/0xa50 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 cifs_root_iget+0x163/0x5f0 [cifs] cifs_smb3_do_mount+0x5bd/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f02c08d5b1e Fixes: 2e4564b ("smb3: add support for stat of WSL reparse points for special file types") Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit ec51fbd ] Subclassing the generic USB device driver to override the default configuration selection regardless of matching interface drivers. The r815x family devices expose a vendor specific function which the r8152 interface driver wants to handle. This is the preferred device mode. Additionally one or more USB class functions are usually supported for hosts lacking a vendor specific driver. The choice is USB configuration based, with one alternate function per configuration. Example device with both NCM and ECM alternate cfgs: T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 4 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 3 P: Vendor=0bda ProdID=8156 Rev=31.00 S: Manufacturer=Realtek S: Product=USB 10/100/1G/2.5G LAN S: SerialNumber=001000001 C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=256mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=r8152 E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 2 Ivl=128ms C: #Ifs= 2 Cfg#= 2 Atr=a0 MxPwr=256mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0d Prot=00 Driver= E: Ad=83(I) Atr=03(Int.) MxPS= 16 Ivl=128ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=01 Driver= I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=01 Driver= E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms C: #Ifs= 2 Cfg#= 3 Atr=a0 MxPwr=256mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver= E: Ad=83(I) Atr=03(Int.) MxPS= 16 Ivl=128ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver= I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver= E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms A problem with this is that Linux will prefer class functions over vendor specific functions. Using the above example, Linux defaults to cfg #2, running the device in a sub-optimal NCM mode. Previously we've attempted to work around the problem by blacklisting the devices in the ECM class driver "cdc_ether", and matching on the ECM class function in the vendor specific interface driver. The latter has been used to switch back to the vendor specific configuration when the driver is probed for a class function. This workaround has several issues; - class driver blacklists is additional maintanence cruft in an unrelated driver - class driver blacklists prevents users from optionally running the devices in class mode - each device needs double match entries in the vendor driver - the initial probing as a class function slows down device discovery Now these issues have become even worse with the introduction of firmware supporting both NCM and ECM, where NCM ends up as the default mode in Linux. To use the same workaround, we now have to blacklist the devices in to two different class drivers and add yet another match entry to the vendor specific driver. This patch implements an alternative workaround strategy - independent of the interface drivers. It avoids adding a blacklist to the cdc_ncm driver and will let us remove the existing blacklist from the cdc_ether driver. As an additional bonus, removing the blacklists allow users to select one of the other device modes if wanted. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit eab0da3 ] Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below). To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow. While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0 but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2); *** DEADLOCK *** 4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: c90005b ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
commit 90d025c upstream. If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset, smb2_get_data_area() sets @len to non-zero but return NULL, so decode_negTokeninit() ends up being called with a NULL @security_blob: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:asn1_ber_decoder+0x173/0xc80 Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00 RSP: 0018:ffffc9000063f950 EFLAGS: 00010202 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000 FS: 00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? __stack_depot_save+0x1e6/0x480 ? exc_page_fault+0x6f/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? asn1_ber_decoder+0x173/0xc80 ? check_object+0x40/0x340 decode_negTokenInit+0x1e/0x30 [cifs] SMB2_negotiate+0xc99/0x17c0 [cifs] ? smb2_negotiate+0x46/0x60 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 smb2_negotiate+0x46/0x60 [cifs] cifs_negotiate_protocol+0xae/0x130 [cifs] cifs_get_smb_ses+0x517/0x1040 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? queue_delayed_work_on+0x5d/0x90 cifs_mount_get_session+0x78/0x200 [cifs] dfs_mount_share+0x13a/0x9f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 ? find_nls+0x16/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7fce52c2ab1e Fix this by setting @len to zero when @off == 0 so callers won't attempt to dereference non-existing data areas. Reported-by: Robert Morris <rtm@csail.mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
commit 3a42709 upstream. Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that their sum does not wrap to a number that is smaller than @reparse_buf and we end up with a wild pointer as follows: BUG: unable to handle page fault for address: ffff88809c5cd45f #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs] Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff 49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b> 04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff RSP: 0018:ffffc90000347a90 EFLAGS: 00010212 RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4 RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100 R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000 FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? _raw_spin_unlock_irqrestore+0x44/0x60 ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs] cifs_get_fattr+0x16e/0xa50 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 cifs_root_iget+0x163/0x5f0 [cifs] cifs_smb3_do_mount+0x5bd/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f02c08d5b1e Fixes: 2e4564b ("smb3: add support for stat of WSL reparse points for special file types") Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit ec51fbd ] Subclassing the generic USB device driver to override the default configuration selection regardless of matching interface drivers. The r815x family devices expose a vendor specific function which the r8152 interface driver wants to handle. This is the preferred device mode. Additionally one or more USB class functions are usually supported for hosts lacking a vendor specific driver. The choice is USB configuration based, with one alternate function per configuration. Example device with both NCM and ECM alternate cfgs: T: Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 4 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs= 3 P: Vendor=0bda ProdID=8156 Rev=31.00 S: Manufacturer=Realtek S: Product=USB 10/100/1G/2.5G LAN S: SerialNumber=001000001 C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=256mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=00 Driver=r8152 E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=83(I) Atr=03(Int.) MxPS= 2 Ivl=128ms C: #Ifs= 2 Cfg#= 2 Atr=a0 MxPwr=256mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0d Prot=00 Driver= E: Ad=83(I) Atr=03(Int.) MxPS= 16 Ivl=128ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=01 Driver= I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=01 Driver= E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms C: #Ifs= 2 Cfg#= 3 Atr=a0 MxPwr=256mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver= E: Ad=83(I) Atr=03(Int.) MxPS= 16 Ivl=128ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver= I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver= E: Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms A problem with this is that Linux will prefer class functions over vendor specific functions. Using the above example, Linux defaults to cfg #2, running the device in a sub-optimal NCM mode. Previously we've attempted to work around the problem by blacklisting the devices in the ECM class driver "cdc_ether", and matching on the ECM class function in the vendor specific interface driver. The latter has been used to switch back to the vendor specific configuration when the driver is probed for a class function. This workaround has several issues; - class driver blacklists is additional maintanence cruft in an unrelated driver - class driver blacklists prevents users from optionally running the devices in class mode - each device needs double match entries in the vendor driver - the initial probing as a class function slows down device discovery Now these issues have become even worse with the introduction of firmware supporting both NCM and ECM, where NCM ends up as the default mode in Linux. To use the same workaround, we now have to blacklist the devices in to two different class drivers and add yet another match entry to the vendor specific driver. This patch implements an alternative workaround strategy - independent of the interface drivers. It avoids adding a blacklist to the cdc_ncm driver and will let us remove the existing blacklist from the cdc_ether driver. As an additional bonus, removing the blacklists allow users to select one of the other device modes if wanted. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit eab0da3 ] Due to the cited patch, devlink health commands take devlink lock and this may result in deadlock for mlx5e_tx_reporter as it takes local state_lock before calling devlink health report and on the other hand devlink health commands such as diagnose for same reporter take local state_lock after taking devlink lock (see kernel log below). To fix it, remove local state_lock from mlx5e_tx_timeout_work() before calling devlink_health_report() and take care to cancel the work before any call to close channels, which may free the SQs that should be handled by the work. Before cancel_work_sync(), use current_work() to check we are not calling it from within the work, as mlx5e_tx_timeout_work() itself may close the channels and reopen as part of recovery flow. While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock() and a flag to avoid deadlock by calling cancel_work_sync() before closing the channels while holding rtnl_lock too. Kernel log: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted ------------------------------------------------------ kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0 but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&priv->state_lock){+.+.}-{3:3}: __mutex_lock+0x12c/0x14b0 mlx5e_rx_reporter_diagnose+0x71/0x700 [mlx5_core] devlink_nl_cmd_health_reporter_diagnose_doit+0x212/0xa50 genl_family_rcv_msg_doit+0x1e9/0x2f0 genl_rcv_msg+0x2e9/0x530 netlink_rcv_skb+0x11d/0x340 genl_rcv+0x24/0x40 netlink_unicast+0x438/0x710 netlink_sendmsg+0x788/0xc40 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x1c1/0x290 __x64_sys_sendto+0xdd/0x1b0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 -> #0 (&devlink->lock_key#2){+.+.}-{3:3}: __lock_acquire+0x2c8a/0x6200 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&priv->state_lock); lock(&devlink->lock_key#2); lock(&priv->state_lock); lock(&devlink->lock_key#2); *** DEADLOCK *** 4 locks held by kworker/u16:2/65: #0: ffff88811a55b138 ((wq_completion)mlx5e#2){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888101de7db8 ((work_completion)(&priv->tx_timeout_work)){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffffffff84ce8328 (rtnl_mutex){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x53/0x280 [mlx5_core] #3: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] stack backtrace: CPU: 1 PID: 65 Comm: kworker/u16:2 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? find_held_lock+0x2d/0x110 ? __stack_depot_save+0x24c/0x520 ? alloc_chain_hlocks+0x228/0x700 __lock_acquire+0x2c8a/0x6200 ? register_lock_class+0x1860/0x1860 ? kasan_save_stack+0x1e/0x40 ? kasan_set_free_info+0x20/0x30 ? ____kasan_slab_free+0x11d/0x1b0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? devlink_health_report+0x3d5/0x7e0 lock_acquire+0x1c1/0x550 ? devlink_health_report+0x2f1/0x7e0 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? find_held_lock+0x2d/0x110 __mutex_lock+0x12c/0x14b0 ? devlink_health_report+0x2f1/0x7e0 ? devlink_health_report+0x2f1/0x7e0 ? mutex_lock_io_nested+0x1320/0x1320 ? trace_hardirqs_on+0x2d/0x100 ? bit_wait_io_timeout+0x170/0x170 ? devlink_health_do_dump.part.0+0x171/0x3a0 ? kfree+0x1ba/0x520 ? devlink_health_do_dump.part.0+0x171/0x3a0 devlink_health_report+0x2f1/0x7e0 mlx5e_health_report+0xc9/0xd7 [mlx5_core] mlx5e_reporter_tx_timeout+0x2ab/0x3d0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? mlx5e_reporter_tx_err_cqe+0x1b0/0x1b0 [mlx5_core] ? mlx5e_tx_reporter_timeout_dump+0x70/0x70 [mlx5_core] ? mlx5e_tx_reporter_dump_sq+0x320/0x320 [mlx5_core] ? mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core] ? mutex_lock_io_nested+0x1320/0x1320 ? process_one_work+0x70f/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? lock_downgrade+0x6e0/0x6e0 mlx5e_tx_timeout_work+0x1c1/0x280 [mlx5_core] process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: c90005b ("devlink: Hold the instance lock in health callbacks") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
[ Upstream commit 15319a4 ] As &card->tx_queue_lock is acquired under softirq context along the following call chain from solos_bh(), other acquisition of the same lock inside process context should disable at least bh to avoid double lock. <deadlock #2> pclose() --> spin_lock(&card->tx_queue_lock) <interrupt> --> solos_bh() --> fpga_tx() --> spin_lock(&card->tx_queue_lock) This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock. To prevent the potential deadlock, the patch uses spin_lock_bh() on &card->tx_queue_lock under process context code consistently to prevent the possible deadlock scenario. Fixes: 213e85d ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
commit 90d025c upstream. If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset, smb2_get_data_area() sets @len to non-zero but return NULL, so decode_negTokeninit() ends up being called with a NULL @security_blob: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:asn1_ber_decoder+0x173/0xc80 Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00 RSP: 0018:ffffc9000063f950 EFLAGS: 00010202 RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000 FS: 00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? __stack_depot_save+0x1e6/0x480 ? exc_page_fault+0x6f/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? asn1_ber_decoder+0x173/0xc80 ? check_object+0x40/0x340 decode_negTokenInit+0x1e/0x30 [cifs] SMB2_negotiate+0xc99/0x17c0 [cifs] ? smb2_negotiate+0x46/0x60 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 smb2_negotiate+0x46/0x60 [cifs] cifs_negotiate_protocol+0xae/0x130 [cifs] cifs_get_smb_ses+0x517/0x1040 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? queue_delayed_work_on+0x5d/0x90 cifs_mount_get_session+0x78/0x200 [cifs] dfs_mount_share+0x13a/0x9f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 ? find_nls+0x16/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7fce52c2ab1e Fix this by setting @len to zero when @off == 0 so callers won't attempt to dereference non-existing data areas. Reported-by: Robert Morris <rtm@csail.mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 20, 2023
commit 3a42709 upstream. Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that their sum does not wrap to a number that is smaller than @reparse_buf and we end up with a wild pointer as follows: BUG: unable to handle page fault for address: ffff88809c5cd45f #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs] Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff 49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b> 04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff RSP: 0018:ffffc90000347a90 EFLAGS: 00010212 RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4 RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100 R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000 FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? _raw_spin_unlock_irqrestore+0x44/0x60 ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs] cifs_get_fattr+0x16e/0xa50 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lock_acquire+0xbf/0x2b0 cifs_root_iget+0x163/0x5f0 [cifs] cifs_smb3_do_mount+0x5bd/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f02c08d5b1e Fixes: 2e4564b ("smb3: add support for stat of WSL reparse points for special file types") Cc: stable@vger.kernel.org Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 23, 2023
[ Upstream commit 807252f ] Running smb2.rename test from Samba smbtorture suite against a kernel built with lockdep triggers a "possible recursive locking detected" warning. This is because mnt_want_write() is called twice with no mnt_drop_write() in between: -> ksmbd_vfs_mkdir() -> ksmbd_vfs_kern_path_create() -> kern_path_create() -> filename_create() -> mnt_want_write() -> mnt_want_write() Fix this by removing the mnt_want_write/mnt_drop_write calls from vfs helpers that call kern_path_create(). Full lockdep trace below: ============================================ WARNING: possible recursive locking detected 6.6.0-rc5 #775 Not tainted -------------------------------------------- kworker/1:1/32 is trying to acquire lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: ksmbd_vfs_mkdir+0xe1/0x410 but task is already holding lock: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_writers#5); lock(sb_writers#5); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/1:1/32: #0: ffff8880064e4138 ((wq_completion)ksmbd-io){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #1: ffff888005b0fdd0 ((work_completion)(&work->work)){+.+.}-{0:0}, at: process_one_work+0x40e/0x980 #2: ffff888005ac83f8 (sb_writers#5){.+.+}-{0:0}, at: filename_create+0xb6/0x260 #3: ffff8880057ce760 (&type->i_mutex_dir_key#3/1){+.+.}-{3:3}, at: filename_create+0x123/0x260 Cc: stable@vger.kernel.org Fixes: 40b268d ("ksmbd: add mnt_want_write to ksmbd vfs functions") Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 28, 2023
With the current bandwidth allocation we end up reserving too much for the USB 3.x and PCIe tunnels that leads to reduced capabilities for the second DisplayPort tunnel. Fix this by decreasing the USB 3.x allocation to 900 Mb/s which then allows both tunnels to get the maximum HBR2 bandwidth. This way, the reserved bandwidth for USB 3.x and PCIe, would be 1350 Mb/s (taking weights of USB 3.x and PCIe into account). So bandwidth allocations on a link are: USB 3.x + PCIe tunnels => 1350 Mb/s DisplayPort tunnel #1 => 17280 Mb/s DisplayPort tunnel #2 => 17280 Mb/s Total consumed bandwidth is 35910 Mb/s. So that all the above can be tunneled on a Gen 3 link (which allows maximum of 36000 Mb/s). Fixes: 582e70b ("thunderbolt: Change bandwidth reservations to comply USB4 v2") Signed-off-by: Gil Fine <gil.fine@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
woodsts
pushed a commit
that referenced
this pull request
Dec 28, 2023
Coverity Scan reports the following issue. But it's impossible that mlx5_get_dev_index returns 7 for PF, even if the index is calculated from PCI FUNC ID. So add the checking to make coverity slience. CID 610894 (#2 of 2): Out-of-bounds write (OVERRUN) Overrunning array esw->fdb_table.offloads.peer_miss_rules of 4 8-byte elements at element index 7 (byte offset 63) using index mlx5_get_dev_index(peer_dev) (which evaluates to 7). Fixes: 9bee385 ("net/mlx5: E-switch, refactor FDB miss rule add/remove") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
woodsts
pushed a commit
that referenced
this pull request
Dec 28, 2023
syzbot found a potential circular dependency leading to a deadlock:
-> #3 (&hdev->req_lock){+.+.}-{3:3}:
__mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
__mutex_lock kernel/locking/mutex.c:732 [inline]
mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
hci_dev_do_close+0x3f/0x9f net/bluetooth/hci_core.c:551
hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
vfs_write+0x277/0xcf5 fs/read_write.c:594
ksys_write+0x19b/0x2bd fs/read_write.c:650
do_syscall_x64 arch/x86/entry/common.c:55 [inline]
do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
entry_SYSCALL_64_after_hwframe+0x61/0xcb
-> #2 (rfkill_global_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
__mutex_lock kernel/locking/mutex.c:732 [inline]
mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
rfkill_register+0x30/0x7e3 net/rfkill/core.c:1045
hci_register_dev+0x48f/0x96d net/bluetooth/hci_core.c:2622
__vhci_create_device drivers/bluetooth/hci_vhci.c:341 [inline]
vhci_create_device+0x3ad/0x68f drivers/bluetooth/hci_vhci.c:374
vhci_get_user drivers/bluetooth/hci_vhci.c:431 [inline]
vhci_write+0x37b/0x429 drivers/bluetooth/hci_vhci.c:511
call_write_iter include/linux/fs.h:2109 [inline]
new_sync_write fs/read_write.c:509 [inline]
vfs_write+0xaa8/0xcf5 fs/read_write.c:596
ksys_write+0x19b/0x2bd fs/read_write.c:650
do_syscall_x64 arch/x86/entry/common.c:55 [inline]
do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
entry_SYSCALL_64_after_hwframe+0x61/0xcb
-> #1 (&data->open_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
__mutex_lock kernel/locking/mutex.c:732 [inline]
mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
vhci_send_frame+0x68/0x9c drivers/bluetooth/hci_vhci.c:75
hci_send_frame+0x1cc/0x2ff net/bluetooth/hci_core.c:2989
hci_sched_acl_pkt net/bluetooth/hci_core.c:3498 [inline]
hci_sched_acl net/bluetooth/hci_core.c:3583 [inline]
hci_tx_work+0xb94/0x1a60 net/bluetooth/hci_core.c:3654
process_one_work+0x901/0xfb8 kernel/workqueue.c:2310
worker_thread+0xa67/0x1003 kernel/workqueue.c:2457
kthread+0x36a/0x430 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
-> #0 ((work_completion)(&hdev->tx_work)){+.+.}-{0:0}:
check_prev_add kernel/locking/lockdep.c:3053 [inline]
check_prevs_add kernel/locking/lockdep.c:3172 [inline]
validate_chain kernel/locking/lockdep.c:3787 [inline]
__lock_acquire+0x2d32/0x77fa kernel/locking/lockdep.c:5011
lock_acquire+0x273/0x4d5 kernel/locking/lockdep.c:5622
__flush_work+0xee/0x19f kernel/workqueue.c:3090
hci_dev_close_sync+0x32f/0x1113 net/bluetooth/hci_sync.c:4352
hci_dev_do_close+0x47/0x9f net/bluetooth/hci_core.c:553
hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
vfs_write+0x277/0xcf5 fs/read_write.c:594
ksys_write+0x19b/0x2bd fs/read_write.c:650
do_syscall_x64 arch/x86/entry/common.c:55 [inline]
do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
entry_SYSCALL_64_after_hwframe+0x61/0xcb
This change removes the need for acquiring the open_mutex in
vhci_send_frame, thus eliminating the potential deadlock while
maintaining the required packet ordering.
Fixes: 92d4abd ("Bluetooth: vhci: Fix race when opening vhci device")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
woodsts
pushed a commit
that referenced
this pull request
Dec 28, 2023
Calling led_trigger_register() when attaching a PHY located on an SFP
module potentially (and practically) leads into a deadlock.
Fix this by not calling led_trigger_register() for PHYs localted on SFP
modules as such modules actually never got any LEDs.
======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc4-next-20231208+ #0 Tainted: G O
------------------------------------------------------
kworker/u8:2/43 is trying to acquire lock:
ffffffc08108c4e8 (triggers_list_lock){++++}-{3:3}, at: led_trigger_register+0x4c/0x1a8
but task is already holding lock:
ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&sfp->sm_mutex){+.+.}-{3:3}:
__mutex_lock+0x88/0x7a0
mutex_lock_nested+0x20/0x28
cleanup_module+0x2ae0/0x3120 [sfp]
sfp_register_bus+0x5c/0x9c
sfp_register_socket+0x48/0xd4
cleanup_module+0x271c/0x3120 [sfp]
platform_probe+0x64/0xb8
really_probe+0x17c/0x3c0
__driver_probe_device+0x78/0x164
driver_probe_device+0x3c/0xd4
__driver_attach+0xec/0x1f0
bus_for_each_dev+0x60/0xa0
driver_attach+0x20/0x28
bus_add_driver+0x108/0x208
driver_register+0x5c/0x118
__platform_driver_register+0x24/0x2c
init_module+0x28/0xa7c [sfp]
do_one_initcall+0x70/0x2ec
do_init_module+0x54/0x1e4
load_module+0x1b78/0x1c8c
__do_sys_init_module+0x1bc/0x2cc
__arm64_sys_init_module+0x18/0x20
invoke_syscall.constprop.0+0x4c/0xdc
do_el0_svc+0x3c/0xbc
el0_svc+0x34/0x80
el0t_64_sync_handler+0xf8/0x124
el0t_64_sync+0x150/0x154
-> #2 (rtnl_mutex){+.+.}-{3:3}:
__mutex_lock+0x88/0x7a0
mutex_lock_nested+0x20/0x28
rtnl_lock+0x18/0x20
set_device_name+0x30/0x130
netdev_trig_activate+0x13c/0x1ac
led_trigger_set+0x118/0x234
led_trigger_write+0x104/0x17c
sysfs_kf_bin_write+0x64/0x80
kernfs_fop_write_iter+0x128/0x1b4
vfs_write+0x178/0x2a4
ksys_write+0x58/0xd4
__arm64_sys_write+0x18/0x20
invoke_syscall.constprop.0+0x4c/0xdc
do_el0_svc+0x3c/0xbc
el0_svc+0x34/0x80
el0t_64_sync_handler+0xf8/0x124
el0t_64_sync+0x150/0x154
-> #1 (&led_cdev->trigger_lock){++++}-{3:3}:
down_write+0x4c/0x13c
led_trigger_write+0xf8/0x17c
sysfs_kf_bin_write+0x64/0x80
kernfs_fop_write_iter+0x128/0x1b4
vfs_write+0x178/0x2a4
ksys_write+0x58/0xd4
__arm64_sys_write+0x18/0x20
invoke_syscall.constprop.0+0x4c/0xdc
do_el0_svc+0x3c/0xbc
el0_svc+0x34/0x80
el0t_64_sync_handler+0xf8/0x124
el0t_64_sync+0x150/0x154
-> #0 (triggers_list_lock){++++}-{3:3}:
__lock_acquire+0x12a0/0x2014
lock_acquire+0x100/0x2ac
down_write+0x4c/0x13c
led_trigger_register+0x4c/0x1a8
phy_led_triggers_register+0x9c/0x214
phy_attach_direct+0x154/0x36c
phylink_attach_phy+0x30/0x60
phylink_sfp_connect_phy+0x140/0x510
sfp_add_phy+0x34/0x50
init_module+0x15c/0xa7c [sfp]
cleanup_module+0x1d94/0x3120 [sfp]
cleanup_module+0x2bb4/0x3120 [sfp]
process_one_work+0x1f8/0x4ec
worker_thread+0x1e8/0x3d8
kthread+0x104/0x110
ret_from_fork+0x10/0x20
other info that might help us debug this:
Chain exists of:
triggers_list_lock --> rtnl_mutex --> &sfp->sm_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sfp->sm_mutex);
lock(rtnl_mutex);
lock(&sfp->sm_mutex);
lock(triggers_list_lock);
*** DEADLOCK ***
4 locks held by kworker/u8:2/43:
#0: ffffff80c000f938 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x150/0x4ec
#1: ffffffc08214bde8 ((work_completion)(&(&sfp->timeout)->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x4ec
#2: ffffffc0810902f8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x18/0x20
#3: ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]
stack backtrace:
CPU: 0 PID: 43 Comm: kworker/u8:2 Tainted: G O 6.7.0-rc4-next-20231208+ #0
Hardware name: Bananapi BPI-R4 (DT)
Workqueue: events_power_efficient cleanup_module [sfp]
Call trace:
dump_backtrace+0xa8/0x10c
show_stack+0x14/0x1c
dump_stack_lvl+0x5c/0xa0
dump_stack+0x14/0x1c
print_circular_bug+0x328/0x430
check_noncircular+0x124/0x134
__lock_acquire+0x12a0/0x2014
lock_acquire+0x100/0x2ac
down_write+0x4c/0x13c
led_trigger_register+0x4c/0x1a8
phy_led_triggers_register+0x9c/0x214
phy_attach_direct+0x154/0x36c
phylink_attach_phy+0x30/0x60
phylink_sfp_connect_phy+0x140/0x510
sfp_add_phy+0x34/0x50
init_module+0x15c/0xa7c [sfp]
cleanup_module+0x1d94/0x3120 [sfp]
cleanup_module+0x2bb4/0x3120 [sfp]
process_one_work+0x1f8/0x4ec
worker_thread+0x1e8/0x3d8
kthread+0x104/0x110
ret_from_fork+0x10/0x20
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Fixes: 01e5b72 ("net: phy: Add a binding for PHY LEDs")
Link: https://lore.kernel.org/r/102a9dce38bdf00215735d04cd4704458273ad9c.1702339354.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 28, 2023
Trying to suspend to RAM on SAMA5D27 EVK leads to the following lockdep
warning:
============================================
WARNING: possible recursive locking detected
6.7.0-rc5-wt+ #532 Not tainted
--------------------------------------------
sh/92 is trying to acquire lock:
c3cf306c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100
but task is already holding lock:
c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&irq_desc_lock_class);
lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
6 locks held by sh/92:
#0: c3aa0258 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0xd8/0x178
#1: c4c2df44 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x138/0x284
#2: c32684a0 (kn->active){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x148/0x284
#3: c232b6d4 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend+0x13c/0x4e8
#4: c387b088 (&dev->mutex){....}-{3:3}, at: __device_suspend+0x1e8/0x91c
#5: c3d7c46c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xe8/0x100
stack backtrace:
CPU: 0 PID: 92 Comm: sh Not tainted 6.7.0-rc5-wt+ #532
Hardware name: Atmel SAMA5
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x34/0x48
dump_stack_lvl from __lock_acquire+0x19ec/0x3a0c
__lock_acquire from lock_acquire.part.0+0x124/0x2d0
lock_acquire.part.0 from _raw_spin_lock_irqsave+0x5c/0x78
_raw_spin_lock_irqsave from __irq_get_desc_lock+0xe8/0x100
__irq_get_desc_lock from irq_set_irq_wake+0xa8/0x204
irq_set_irq_wake from atmel_gpio_irq_set_wake+0x58/0xb4
atmel_gpio_irq_set_wake from irq_set_irq_wake+0x100/0x204
irq_set_irq_wake from gpio_keys_suspend+0xec/0x2b8
gpio_keys_suspend from dpm_run_callback+0xe4/0x248
dpm_run_callback from __device_suspend+0x234/0x91c
__device_suspend from dpm_suspend+0x224/0x43c
dpm_suspend from dpm_suspend_start+0x9c/0xa8
dpm_suspend_start from suspend_devices_and_enter+0x1e0/0xa84
suspend_devices_and_enter from pm_suspend+0x460/0x4e8
pm_suspend from state_store+0x78/0xe4
state_store from kernfs_fop_write_iter+0x1a0/0x284
kernfs_fop_write_iter from vfs_write+0x38c/0x6f4
vfs_write from ksys_write+0xd8/0x178
ksys_write from ret_fast_syscall+0x0/0x1c
Exception stack(0xc52b3fa8 to 0xc52b3ff0)
3fa0: 00000004 005a0ae8 00000001 005a0ae8 00000004 00000001
3fc0: 00000004 005a0ae8 00000001 00000004 00000004 b6c616c0 00000020 0059d190
3fe0: 00000004 b6c61678 aec5a041 aebf1a26
This warning is raised because pinctrl-at91-pio4 uses chained IRQ. Whenever
a wake up source configures an IRQ through irq_set_irq_wake, it will
lock the corresponding IRQ desc, and then call irq_set_irq_wake on "parent"
IRQ which will do the same on its own IRQ desc, but since those two locks
share the same class, lockdep reports this as an issue.
Fix lockdep false positive by setting a different class for parent and
children IRQ
Fixes: 7761808 ("pinctrl: introduce driver for Atmel PIO4 controller")
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20231215-lockdep_warning-v1-1-8137b2510ed5@bootlin.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
woodsts
pushed a commit
that referenced
this pull request
Dec 28, 2023
…kernel/git/kvmarm/kvmarm into kvm-master KVM/arm64 fixes for 6.7, part #2 - Ensure a vCPU's redistributor is unregistered from the MMIO bus if vCPU creation fails - Fix building KVM selftests for arm64 from the top-level Makefile
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps pip from 23.2.1 to 23.3.
Changelog
Sourced from pip's changelog.
... (truncated)
Commits
e3dc91dBump for release3e85558Update AUTHORS.txt8d02787Reclassify news fragmentf6ecf40Merge pull request #12350 from sbidoul/readact-collecting-url3060865Merge pull request #12335 from edmorley/patch-18f0ed32Redact URLs in Collecting... logsd1659b8Correct issue number for NEWS entry added by #121972333ef3Upgrade urllib3 to 1.26.17 (#12343)496b268Update "Running Tests" documentation (#12334)d1f0981Merge pull request #12331 from sbidoul/update-egg-deprecation-messageDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)You can disable automated security fix PRs for this repo from the Security Alerts page.