Update README.md by Endilll · Pull Request #1 · Endilll/llvm-project

Endilll · 2025-09-05T02:33:20Z

No description provided.

…lvm#178069) Kernel panic is a special case, and there is no signal or exception for that so we need to rely on special workaround called `dumptid`. FreeBSDKernel plugin is supposed to find this thread and set it manually through `SetStopInfo()` in `CalculateStopInfo()` like Mach core plugin does. Before (We had to find and select crashed thread list otherwise thread 1 was selected by default): ``` ➜ sudo lldb /boot/panic/kernel -c /var/crash/vmcore.last (lldb) target create "/boot/panic/kernel" --core "/var/crash/vmcore.last" Core file '/var/crash/vmcore.last' (x86_64) was loaded. (lldb) bt * thread #1, name = '(pid 12991) dtrace' * frame #0: 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff8015882f780, flags=259) at sched_ule.c:2448:26 frame #1: 0xffffffff80bd38d2 kernel`mi_switch(flags=259) at kern_synch.c:530:2 frame llvm#2: 0xffffffff80c29799 kernel`sleepq_switch(wchan=0xfffff8014edff300, pri=0) at subr_sleepqueue.c:608:2 frame llvm#3: 0xffffffff80c29b76 kernel`sleepq_catch_signals(wchan=0xfffff8014edff300, pri=0) at subr_sleepqueue.c:523:3 frame llvm#4: 0xffffffff80c29d32 kernel`sleepq_timedwait_sig(wchan=<unavailable>, pri=<unavailable>) at subr_sleepqueue.c:704:11 frame llvm#5: 0xffffffff80bd2e2d kernel`_sleep(ident=0xfffff8014edff300, lock=0xffffffff81df2880, priority=768, wmesg="uwait", sbt=2573804118162, pr=0, flags=512) at kern_synch.c:215:10 frame llvm#6: 0xffffffff80be8622 kernel`umtxq_sleep(uq=0xfffff8014edff300, wmesg="uwait", timo=0xfffffe0279cb3d20) at kern_umtx.c:843:11 frame llvm#7: 0xffffffff80bef87a kernel`do_wait(td=0xfffff8015882f780, addr=<unavailable>, id=0, timeout=0xfffffe0279cb3d90, compat32=1, is_private=1) at kern_umtx.c:1316:12 frame llvm#8: 0xffffffff80bed264 kernel`__umtx_op_wait_uint_private(td=0xfffff8015882f780, uap=0xfffffe0279cb3dd8, ops=<unavailable>) at kern_umtx.c:3990:10 frame llvm#9: 0xffffffff80beaabe kernel`sys__umtx_op [inlined] kern__umtx_op(td=<unavailable>, obj=<unavailable>, op=<unavailable>, val=<unavailable>, uaddr1=<unavailable>, uaddr2=<unavailable>, ops=<unavailable>) at kern_umtx.c:4999:10 frame llvm#10: 0xffffffff80beaa89 kernel`sys__umtx_op(td=<unavailable>, uap=<unavailable>) at kern_umtx.c:5024:10 frame llvm#11: 0xffffffff81122cd1 kernel`amd64_syscall [inlined] syscallenter(td=0xfffff8015882f780) at subr_syscall.c:165:11 frame llvm#12: 0xffffffff81122c19 kernel`amd64_syscall(td=0xfffff8015882f780, traced=0) at trap.c:1208:2 frame llvm#13: 0xffffffff810f1dbb kernel`fast_syscall_common at exception.S:570 ``` After: ``` ➜ sudo ./build/bin/lldb /boot/panic/kernel -c /var/crash/vmcore.last (lldb) target create "/boot/panic/kernel" --core "/var/crash/vmcore.last" Core file '/var/crash/vmcore.last' (x86_64) was loaded. (lldb) bt * thread llvm#18, name = '(pid 5409) powerd (crashed)', stop reason = kernel panic * frame #0: 0xffffffff80bc6c91 kernel`__curthread at pcpu_aux.h:57:2 [inlined] frame #1: 0xffffffff80bc6c91 kernel`doadump(textdump=0) at kern_shutdown.c:399:2 frame llvm#2: 0xffffffff804b3b7a kernel`db_dump(dummy=<unavailable>, dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at db_command.c:596:10 frame llvm#3: 0xffffffff804b396d kernel`db_command(last_cmdp=<unavailable>, cmd_table=<unavailable>, dopager=true) at db_command.c:508:3 frame llvm#4: 0xffffffff804b362d kernel`db_command_loop at db_command.c:555:3 frame llvm#5: 0xffffffff804b7026 kernel`db_trap(type=<unavailable>, code=<unavailable>) at db_main.c:267:3 frame llvm#6: 0xffffffff80c16aaf kernel`kdb_trap(type=3, code=0, tf=0xfffffe01b605b930) at subr_kdb.c:790:13 frame llvm#7: 0xffffffff8112154e kernel`trap(frame=<unavailable>) at trap.c:614:8 frame llvm#8: 0xffffffff810f14c8 kernel`calltrap at exception.S:285 frame llvm#9: 0xffffffff81da2290 kernel`cn_devtab + 64 frame llvm#10: 0xfffffe01b605b8b0 frame llvm#11: 0xffffffff84001c43 dtrace.ko`dtrace_panic(format=<unavailable>) at dtrace.c:652:2 frame llvm#12: 0xffffffff84005524 dtrace.ko`dtrace_action_panic(ecb=0xfffff80539cad580) at dtrace.c:7022:2 [inlined] frame llvm#13: 0xffffffff840054de dtrace.ko`dtrace_probe(id=88998, arg0=14343377283488, arg1=<unavailable>, arg2=<unavailable>, arg3=<unavailable>, arg4=<unavailable>) at dtrace.c:7665:6 frame llvm#14: 0xffffffff83e5213d systrace.ko`systrace_probe(sa=<unavailable>, type=<unavailable>, retval=<unavailable>) at systrace.c:226:2 frame llvm#15: 0xffffffff8112318d kernel`syscallenter(td=0xfffff801318d5780) at subr_syscall.c:160:4 [inlined] frame llvm#16: 0xffffffff81123112 kernel`amd64_syscall(td=0xfffff801318d5780, traced=0) at trap.c:1208:2 frame llvm#17: 0xffffffff810f1dbb kernel`fast_syscall_common at exception.S:570 ```

In this PR i move the insertion point in the `yieldReplacementForFusedProducer` because i ran into some issue where a `tensor.extract_slices` tried to use a result of `affine.apply` that was inserted at the end of the block instead of the start of it. This is the full error of the test i added before this change: ```mlir third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:83:11: error: operand #1 does not dominate this use %pack = linalg.pack %gen#1 ^ third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:83:11: note: see current operation: %24 = "tensor.extract_slice"(%23, %36, %8) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> third-party/llvm-project/mlir/test/Interfaces/TilingInterface/tile-fuse-and-yield-using-scfforall.mlir:71:12: note: operand defined here (op in the same block) %gen:2 = linalg.generic { ^ // -----// IR Dump After InterpreterPass Failed (transform-interpreter) //----- // #map = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0) -> (d0 * 16)> #map2 = affine_map<(d0) -> (d0 * -16 + 32)> #map3 = affine_map<(d0) -> (16, d0 * -16 + 32)> #map4 = affine_map<(d0) -> (d0 - 1)> "builtin.module"() ({ "func.func"() <{function_type = (tensor<32x1024xf32>) -> (tensor<32x1024xf32>, tensor<2x512x16x2xi8>), sym_name = "fuse_pack_consumer_into_multi_output_generic"}> ({ ^bb0(%arg1: tensor<32x1024xf32>): %2 = "arith.constant"() <{value = 0 : i8}> : () -> i8 %3 = "tensor.empty"() : () -> tensor<32x1024xf32> %4 = "tensor.empty"() : () -> tensor<32x1024xi8> %5 = "tensor.empty"() : () -> tensor<2x512x16x2xi8> %6:2 = "linalg.generic"(%arg1, %3, %4) <{indexing_maps = [#map, #map, #map], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 2>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: i8): %41 = "arith.fptoui"(%arg9) : (f32) -> i8 "linalg.yield"(%arg9, %41) : (f32, i8) -> () }) : (tensor<32x1024xf32>, tensor<32x1024xf32>, tensor<32x1024xi8>) -> (tensor<32x1024xf32>, tensor<32x1024xi8>) %7:3 = "scf.forall"(%5, %3, %4) <{operandSegmentSizes = array<i32: 0, 0, 0, 3>, staticLowerBound = array<i64: 0>, staticStep = array<i64: 1>, staticUpperBound = array<i64: 2>}> ({ ^bb0(%arg2: index, %arg3: tensor<2x512x16x2xi8>, %arg4: tensor<32x1024xf32>, %arg5: tensor<32x1024xi8>): %8 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %9 = "affine.apply"(%arg2) <{map = #map2}> : (index) -> index %10 = "affine.min"(%arg2) <{map = #map3}> : (index) -> index %11 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %12 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %13 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %14 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %15 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %16 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %17 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %18 = "tensor.extract_slice"(%arg1, %12, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %19 = "tensor.empty"() : () -> tensor<32x1024xf32> %20 = "tensor.extract_slice"(%19, %14, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %21 = "tensor.extract_slice"(%3, %14, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %22 = "tensor.empty"() : () -> tensor<32x1024xi8> %23 = "tensor.extract_slice"(%22, %16, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %24 = "tensor.extract_slice"(%4, %16, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %25 = "tensor.empty"() : () -> tensor<32x1024xf32> %26 = "tensor.extract_slice"(%25, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %27 = "tensor.extract_slice"(%arg4, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xf32>, index, index) -> tensor<?x1024xf32> %28 = "tensor.empty"() : () -> tensor<32x1024xi8> %29 = "tensor.extract_slice"(%28, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %30 = "tensor.extract_slice"(%arg5, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %31:2 = "linalg.generic"(%18, %27, %30) <{indexing_maps = [#map, #map, #map], iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 1, 2>}> ({ ^bb0(%arg6: f32, %arg7: f32, %arg8: i8): %40 = "arith.fptoui"(%arg6) : (f32) -> i8 "linalg.yield"(%arg6, %40) : (f32, i8) -> () }) : (tensor<?x1024xf32>, tensor<?x1024xf32>, tensor<?x1024xi8>) -> (tensor<?x1024xf32>, tensor<?x1024xi8>) %32 = "tensor.extract_slice"(%6#1, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<32x1024xi8>, index, index) -> tensor<?x1024xi8> %33 = "tensor.empty"() : () -> tensor<2x512x16x2xi8> %34 = "tensor.extract_slice"(%33, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x512x16x2xi8>, index) -> tensor<1x512x16x2xi8> %35 = "tensor.extract_slice"(%arg3, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<2x512x16x2xi8>, index) -> tensor<1x512x16x2xi8> %36 = "linalg.pack"(%31#1, %35, %2) <{inner_dims_pos = array<i64: 0, 1>, operandSegmentSizes = array<i32: 1, 1, 1, 0>, static_inner_tiles = array<i64: 16, 2>}> : (tensor<?x1024xi8>, tensor<1x512x16x2xi8>, i8) -> tensor<1x512x16x2xi8> %37 = "affine.apply"(%10) <{map = #map4}> : (index) -> index %38 = "affine.apply"(%arg2) <{map = #map1}> : (index) -> index %39 = "affine.apply"(%10) <{map = #map4}> : (index) -> index "scf.forall.in_parallel"() ({ "tensor.parallel_insert_slice"(%36, %arg3, %arg2) <{operandSegmentSizes = array<i32: 1, 1, 1, 0, 0>, static_offsets = array<i64: -9223372036854775808, 0, 0, 0>, static_sizes = array<i64: 1, 512, 16, 2>, static_strides = array<i64: 1, 1, 1, 1>}> : (tensor<1x512x16x2xi8>, tensor<2x512x16x2xi8>, index) -> () "tensor.parallel_insert_slice"(%31#0, %arg4, %38, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<?x1024xf32>, tensor<32x1024xf32>, index, index) -> () "tensor.parallel_insert_slice"(%31#1, %arg5, %8, %10) <{operandSegmentSizes = array<i32: 1, 1, 1, 1, 0>, static_offsets = array<i64: -9223372036854775808, 0>, static_sizes = array<i64: -9223372036854775808, 1024>, static_strides = array<i64: 1, 1>}> : (tensor<?x1024xi8>, tensor<32x1024xi8>, index, index) -> () }) : () -> () }) : (tensor<2x512x16x2xi8>, tensor<32x1024xf32>, tensor<32x1024xi8>) -> (tensor<2x512x16x2xi8>, tensor<32x1024xf32>, tensor<32x1024xi8>) "func.return"(%7#1, %7#0) : (tensor<32x1024xf32>, tensor<2x512x16x2xi8>) -> () }) : () -> () "builtin.module"() ({ "transform.named_sequence"() <{arg_attrs = [{transform.readonly}], function_type = (!transform.any_op) -> (), sym_name = "__transform_main"}> ({ ^bb0(%arg0: !transform.any_op): %0 = "transform.structured.match"(%arg0) <{ops = ["linalg.pack"]}> : (!transform.any_op) -> !transform.any_op %1:2 = "transform.test.fuse_and_yield"(%0) <{tile_interchange = [], tile_sizes = [1], use_forall = true}> : (!transform.any_op) -> (!transform.any_op, !transform.any_op) "transform.yield"() : () -> () }) : () -> () }) {transform.with_named_sequence} : () -> () }) : () -> () ``` I also noticed that Interface tests are missing from the bazel overlay so i also added this.

…m#167446) Add SVE optimization for AArch64 architectures. The idea is to use predicate registers to avoid branching. Microbench in repo shows considerable improvements on NV GB10 (locked on largest X925): ``` ====================================================================== BENCHMARK STATISTICS (time in nanoseconds) ====================================================================== memcpy_Google_A: Old - Mean: 3.1257 ns, Median: 3.1162 ns New - Mean: 2.8402 ns, Median: 2.8265 ns Improvement: +9.14% (mean), +9.30% (median) memcpy_Google_B: Old - Mean: 2.3171 ns, Median: 2.3159 ns New - Mean: 1.6589 ns, Median: 1.6593 ns Improvement: +28.40% (mean), +28.35% (median) memcpy_Google_D: Old - Mean: 8.7602 ns, Median: 8.7645 ns New - Mean: 8.4307 ns, Median: 8.4308 ns Improvement: +3.76% (mean), +3.81% (median) memcpy_Google_L: Old - Mean: 1.7137 ns, Median: 1.7091 ns New - Mean: 1.4530 ns, Median: 1.4553 ns Improvement: +15.22% (mean), +14.85% (median) memcpy_Google_M: Old - Mean: 1.9823 ns, Median: 1.9825 ns New - Mean: 1.4826 ns, Median: 1.4840 ns Improvement: +25.20% (mean), +25.15% (median) memcpy_Google_Q: Old - Mean: 1.6812 ns, Median: 1.6784 ns New - Mean: 1.1538 ns, Median: 1.1517 ns Improvement: +31.37% (mean), +31.38% (median) memcpy_Google_S: Old - Mean: 2.1816 ns, Median: 2.1786 ns New - Mean: 1.6297 ns, Median: 1.6287 ns Improvement: +25.29% (mean), +25.24% (median) memcpy_Google_U: Old - Mean: 2.2851 ns, Median: 2.2825 ns New - Mean: 1.7219 ns, Median: 1.7187 ns Improvement: +24.65% (mean), +24.70% (median) memcpy_Google_W: Old - Mean: 2.0408 ns, Median: 2.0361 ns New - Mean: 1.5260 ns, Median: 1.5252 ns Improvement: +25.23% (mean), +25.09% (median) uniform_384_to_4096: Old - Mean: 26.9067 ns, Median: 26.8845 ns New - Mean: 26.8083 ns, Median: 26.8149 ns Improvement: +0.37% (mean), +0.26% (median) ``` The beginning of the memcpy function looks like the following: ``` Dump of assembler code for function _ZN22__llvm_libc_22_0_0_git6memcpyEPvPKvm: 0x0000000000001340 <+0>: cbz x2, 0x143c <_ZN22__llvm_libc_22_0_0_git6memcpyEPvPKvm+252> 0x0000000000001344 <+4>: cbz x0, 0x1440 <_ZN22__llvm_libc_22_0_0_git6memcpyEPvPKvm+256> 0x0000000000001348 <+8>: cbz x1, 0x1444 <_ZN22__llvm_libc_22_0_0_git6memcpyEPvPKvm+260> 0x000000000000134c <+12>: subs x8, x2, #0x20 0x0000000000001350 <+16>: b.hi 0x1374 <_ZN22__llvm_libc_22_0_0_git6memcpyEPvPKvm+52> // b.pmore 0x0000000000001354 <+20>: rdvl x8, #1 0x0000000000001358 <+24>: whilelo p0.b, xzr, x2 0x000000000000135c <+28>: ld1b {z0.b}, p0/z, [x1] 0x0000000000001360 <+32>: whilelo p1.b, x8, x2 0x0000000000001364 <+36>: ld1b {z1.b}, p1/z, [x1, #1, mul vl] 0x0000000000001368 <+40>: st1b {z0.b}, p0, [x0] 0x000000000000136c <+44>: st1b {z1.b}, p1, [x0, #1, mul vl] 0x0000000000001370 <+48>: ret ``` --------- Co-authored-by: Guillaume Chatelet <chatelet.guillaume@gmail.com>

…8306) In FreeBSD, allproc is a prepend list and new processes are appended at head. This results in reverse pid order, so we first need to order pid incrementally then print threads according to the correct order. Before: ``` Process 0 stopped * thread #1: tid = 101866, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff8015882f780, flags=259) at sched_ule.c:2448:26, name = '(pid 12991) dtrace' thread llvm#2: tid = 101915, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff80158825780, flags=259) at sched_ule.c:2448:26, name = '(pid 11509) zsh' thread llvm#3: tid = 101942, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff80142599000, flags=259) at sched_ule.c:2448:26, name = '(pid 11504) ftcleanup' thread llvm#4: tid = 101545, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff80131898000, flags=259) at sched_ule.c:2448:26, name = '(pid 5599) zsh' thread llvm#5: tid = 100905, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff80131899000, flags=259) at sched_ule.c:2448:26, name = '(pid 5598) sshd-session' thread llvm#6: tid = 101693, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff8015886e780, flags=259) at sched_ule.c:2448:26, name = '(pid 5595) sshd-session' thread llvm#7: tid = 101626, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801588be000, flags=259) at sched_ule.c:2448:26, name = '(pid 5592) sh' ... ``` After: ``` (lldb) thread list Process 0 stopped * thread #1: tid = 100000, 0xffffffff80bf9322 kernel`sched_switch(td=0xffffffff81abe840, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel' thread llvm#2: tid = 100035, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801052d9780, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel/softirq_0' thread llvm#3: tid = 100036, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801052d9000, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel/softirq_1' thread llvm#4: tid = 100037, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801052d8780, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel/softirq_2' thread llvm#5: tid = 100038, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801052d8000, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel/softirq_3' thread llvm#6: tid = 100039, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801052d7780, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel/softirq_4' thread llvm#7: tid = 100040, 0xffffffff80bf9322 kernel`sched_switch(td=0xfffff801052d7000, flags=259) at sched_ule.c:2448:26, name = '(pid 0) kernel/softirq_5' ... ``` Signed-off-by: Minsoo Choo <minsoochoo0122@proton.me>

…er. (llvm#181941) The progress event reporter has a thread that reports events every 250 millisecond. and is destroyed in its destructor. When in event reporter desctructor, the event reporter may have pending event but the call mutex is destroyed leading to the crash. Relevant stack trace from CI. ``` [2026-02-13T17:46:13.577Z] libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument [2026-02-13T17:46:13.577Z] PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash report from ~/Library/Logs/DiagnosticReports/. [2026-02-13T17:46:13.577Z] #0 0x0000000102b6943c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x10008943c) [2026-02-13T17:46:13.577Z] #1 0x0000000102b67368 llvm::sys::RunSignalHandlers() (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x100087368) [2026-02-13T17:46:13.577Z] llvm#2 0x0000000102b69f20 SignalHandler(int, __siginfo*, void*) (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x100089f20) [2026-02-13T17:46:13.577Z] llvm#3 0x000000018bbdb744 (/usr/lib/system/libsystem_platform.dylib+0x1804e3744) [2026-02-13T17:46:13.577Z] llvm#4 0x000000018bbd1888 (/usr/lib/system/libsystem_pthread.dylib+0x1804d9888) [2026-02-13T17:46:13.577Z] llvm#5 0x000000018bad6850 (/usr/lib/system/libsystem_c.dylib+0x1803de850) [2026-02-13T17:46:13.577Z] llvm#6 0x000000018bb85858 (/usr/lib/libc++abi.dylib+0x18048d858) [2026-02-13T17:46:13.577Z] llvm#7 0x000000018bb744bc (/usr/lib/libc++abi.dylib+0x18047c4bc) [2026-02-13T17:46:13.577Z] llvm#8 0x000000018b7a0424 (/usr/lib/libobjc.A.dylib+0x1800a8424) [2026-02-13T17:46:13.577Z] llvm#9 0x000000018bb84c2c (/usr/lib/libc++abi.dylib+0x18048cc2c) [2026-02-13T17:46:13.577Z] llvm#10 0x000000018bb88394 (/usr/lib/libc++abi.dylib+0x180490394) [2026-02-13T17:46:13.577Z] llvm#11 0x000000018bb8833c (/usr/lib/libc++abi.dylib+0x18049033c) [2026-02-13T17:46:13.577Z] llvm#12 0x000000018bb01b90 (/usr/lib/libc++.1.dylib+0x180409b90) [2026-02-13T17:46:13.577Z] llvm#13 0x000000018bb01b34 (/usr/lib/libc++.1.dylib+0x180409b34) [2026-02-13T17:46:13.577Z] llvm#14 0x000000018bb038a0 (/usr/lib/libc++.1.dylib+0x18040b8a0) [2026-02-13T17:46:13.577Z] llvm#15 0x0000000102b6fbac lldb_dap::DAP::Send(std::__1::variant<lldb_dap::protocol::Request, lldb_dap::protocol::Response, lldb_dap::protocol::Event> const&) (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x10008fbac) [2026-02-13T17:46:13.577Z] llvm#16 0x0000000102b6f890 lldb_dap::DAP::SendJSON(llvm::json::Value const&) (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x10008f890) [2026-02-13T17:46:13.577Z] llvm#17 0x0000000102b78788 std::__1::__function::__func<lldb_dap::DAP::DAP(lldb_dap::Log&, lldb_dap::ReplMode, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>, bool, llvm::StringRef, lldb_private::transport::JSONTransport<lldb_dap::ProtocolDescriptor>&, lldb_private::MainLoopPosix&)::$_0, std::__1::allocator<lldb_dap::DAP::DAP(lldb_dap::Log&, lldb_dap::ReplMode, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>, bool, llvm::StringRef, lldb_private::transport::JSONTransport<lldb_dap::ProtocolDescriptor>&, lldb_private::MainLoopPosix&)::$_0>, void (lldb_dap::ProgressEvent&)>::operator()(lldb_dap::ProgressEvent&) (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x100098788) [2026-02-13T17:46:13.577Z] llvm#18 0x0000000102b8939c lldb_dap::ProgressEventManager::ReportIfNeeded() (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x1000a939c) [2026-02-13T17:46:13.577Z] llvm#19 0x0000000102b8982c lldb_dap::ProgressEventReporter::ReportStartEvents() (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x1000a982c) [2026-02-13T17:46:13.577Z] llvm#20 0x0000000102b8a038 void* std::__1::__thread_proxy[abi:nn200100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, lldb_dap::ProgressEventReporter::ProgressEventReporter(std::__1::function<void (lldb_dap::ProgressEvent&)>)::$_0>>(void*) (/Users/ec2-user/jenkins/workspace/llvm.org/as-lldb-cmake-os-verficiation/lldb-build/bin/lldb-dap+0x1000aa038) [2026-02-13T17:46:13.577Z] llvm#21 0x000000018bbd1c08 (/usr/lib/system/libsystem_pthread.dylib+0x1804d9c08) [2026-02-13T17:46:13.577Z] llvm#22 0x000000018bbccba8 (/usr/lib/system/libsystem_pthread.dylib+0x1804d4ba8) ``` rdar://170331108

I created an issue about this in llvm#179976. Clang's Address Sanitizer installs its own SEH filter which handles some types of uncaught exceptions. Along with register values and some other information, it also generates a stack trace. However, current logic is incomplete. It relies on DbgHelp's SymFunctionTableAccess64 and SymGetModuleBase64 which won't work with machine code that has its RUNTIME_FUNCTION entry registered with Rtl* (e.g. RtlAddFunctionTable) system calls. Most likely, this is because DbgHelp either relies on information in PDB files or considers PDATA and XDATA only from loaded EXE and DLL modules. Either way, consider the following example: ``` #include <windows.h> #include <iostream> #include <vector> typedef union _UNWIND_CODE { struct { BYTE CodeOffset; BYTE UnwindOp : 4; BYTE OpInfo : 4; }; USHORT FrameOffset; } UNWIND_CODE, * PUNWIND_CODE; typedef struct _UNWIND_INFO { BYTE Version : 3; BYTE Flags : 5; BYTE SizeOfProlog; BYTE CountOfCodes; BYTE FrameRegister : 4; BYTE FrameOffset : 4; UNWIND_CODE UnwindCode[1]; // Variable size } UNWIND_INFO, * PUNWIND_INFO; #define UWOP_PUSH_NONVOL 0 #define UWOP_ALLOC_LARGE 1 #define UWOP_ALLOC_SMALL 2 #define UWOP_SET_FPREG 3 #define UWOP_SAVE_NONVOL 4 #define UWOP_SAVE_NONVOL_FAR 5 #define UWOP_SAVE_XMM128 8 #define UWOP_SAVE_XMM128_FAR 9 #define UWOP_PUSH_MACHFRAME 10 int main() { // PUSH RBX (0x53) - Save non-volatile register // SUB RSP, 0x20 (0x48 0x83 0xEC 0x20) - Allocate 32 bytes (shadow space) // XOR RAX, RAX (0x48 0x31 0xC0) - Zero out RAX // MOV RAX, [RAX] (0x48 0x8B 0x00) - Dereference NULL std::vector<unsigned char> code = { 0x53, 0x48, 0x83, 0xEC, 0x20, 0x48, 0x31, 0xC0, 0x48, 0x8B, 0x00 }; size_t codeSize = code.size(); size_t totalSize = 100; LPVOID pMemory = VirtualAlloc(NULL, totalSize, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); BYTE* pCodeBase = (BYTE*)pMemory; PUNWIND_INFO pUnwindInfo = (PUNWIND_INFO)(pCodeBase + codeSize); size_t alignmentPadding = 0; if ((size_t)pUnwindInfo % 4 != 0) { alignmentPadding = 4 - ((size_t)pUnwindInfo % 4); pUnwindInfo = (PUNWIND_INFO)((BYTE*)pUnwindInfo + alignmentPadding); } memcpy(pCodeBase, code.data(), codeSize); pUnwindInfo->Version = 1; pUnwindInfo->Flags = UNW_FLAG_NHANDLER; pUnwindInfo->Flags = 0; pUnwindInfo->SizeOfProlog = 5; pUnwindInfo->CountOfCodes = 2; pUnwindInfo->FrameRegister = 0; pUnwindInfo->FrameOffset = 0; pUnwindInfo->UnwindCode[0].CodeOffset = 5; pUnwindInfo->UnwindCode[0].UnwindOp = UWOP_ALLOC_SMALL; pUnwindInfo->UnwindCode[0].OpInfo = 3; pUnwindInfo->UnwindCode[1].CodeOffset = 1; pUnwindInfo->UnwindCode[1].UnwindOp = UWOP_PUSH_NONVOL; pUnwindInfo->UnwindCode[1].OpInfo = 3; // RBX RUNTIME_FUNCTION tableEntry = {}; tableEntry.BeginAddress = 0; tableEntry.EndAddress = (DWORD)codeSize; tableEntry.UnwindData = (DWORD)((BYTE*)pUnwindInfo - (BYTE*)pMemory); DWORD64 baseAddress = (DWORD64)pMemory; RtlAddFunctionTable(&tableEntry, 1, baseAddress); typedef void(*FuncType)(); FuncType myFunc = (FuncType)pMemory; myFunc(); return 0; } ``` Windows' kernel can propagate hardware exception through that function, so clearly these entries are at least partially correct. Right now, ASan's stack walking produces this (compiled with latest release, clang++): ``` PS D:\Local Projects\cpp-playground> ./a.exe ================================================================= ==14216==ERROR: AddressSanitizer: access-violation on unknown address 0x000000000000 (pc 0x0199561c0008 bp 0x004cf0cffb30 sp 0x004cf0cff970 T0) ==14216==The signal is caused by a READ memory access. ==14216==Hint: address points to the zero page. #0 0x0199561c0007 (<unknown module>) #1 0x000000000000 (<unknown module>) llvm#2 0x000000000000 (<unknown module>) ==14216==Register values: rax = 0 rbx = 4cf0cffaa0 rcx = 7ffcb97b4e28 rdx = 19955dc0000 rdi = 11bf564a0040 rsi = 0 rbp = 4cf0cffb30 rsp = 4cf0cff970 r8 = 7ffffffffffffffc r9 = 1 r10 = 0 r11 = 246 r12 = 0 r13 = 0 r14 = 0 r15 = 0 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: access-violation (<unknown module>) ==14216==ABORTING ``` Frames one and two is just some stack space allocated by that dynamic function. While patched version produces this: ``` PS D:\Local Projects\cpp-playground> ./a.exe ================================================================= ==13660==ERROR: AddressSanitizer: access-violation on unknown address 0x000000000000 (pc 0x01ed5ad70008 bp 0x00d76492f650 sp 0x00d76492f490 T0) ==13660==The signal is caused by a READ memory access. ==13660==Hint: address points to the zero page. #0 0x01ed5ad70007 (<unknown module>) #1 0x7ff732e518a1 in main (D:\Local Projects\cpp-playground\a.exe+0x1400018a1) llvm#2 0x7ff732e56a9b in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78 llvm#3 0x7ff732e56a9b in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 llvm#4 0x7ffcb878e8d6 (C:\WINDOWS\System32\KERNEL32.DLL+0x18002e8d6) llvm#5 0x7ffcb966c53b (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18008c53b) ==13660==Register values: rax = 0 rbx = d76492f5c0 rcx = 7ffcb97b4e28 rdx = 1ed5a870000 rdi = 12135afa0040 rsi = 0 rbp = d76492f650 rsp = d76492f490 r8 = 7ffffffffffffffc r9 = 1 r10 = 0 r11 = 246 r12 = 0 r13 = 0 r14 = 0 r15 = 0 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: access-violation (<unknown module>) ==13660==ABORTING ``` Now we see that stack walking handled our dynamic function properly. Interestingly enough, it appears that other overloaded version of UnwindSlow procedure that works without CONTEXT structure already has some logic to handle this. Theoretically, symbolizer should also be able to provide some information about these functions, but I don't think that this is necessary. I added SANITIZER_WINDOWS64 check because I am pretty sure Microsoft only mentions these functions for 64 bit version of their OS. I also can't check how this works on ARM.

Using code/ideas from the x86 backend to optimize a select on a bitcast integer. The previous aarch64 approach was to individually extract the bits from the mask, which is kind of terrible. https://rust.godbolt.org/z/576sndT66 ```llvm define void @if_then_else8(ptr %out, i8 %mask, ptr %if_true, ptr %if_false) { start: %t = load <8 x i32>, ptr %if_true, align 4 %f = load <8 x i32>, ptr %if_false, align 4 %m = bitcast i8 %mask to <8 x i1> %s = select <8 x i1> %m, <8 x i32> %t, <8 x i32> %f store <8 x i32> %s, ptr %out, align 4 ret void } ``` turned into ```asm if_then_else8: // @if_then_else8 sub sp, sp, llvm#16 ubfx w8, w1, llvm#4, #1 and w11, w1, #0x1 ubfx w9, w1, llvm#5, #1 fmov s1, w11 ubfx w10, w1, #1, #1 fmov s0, w8 ubfx w8, w1, llvm#6, #1 ldp q5, q2, [x3] mov v1.h[1], w10 ldp q4, q3, [x2] mov v0.h[1], w9 ubfx w9, w1, llvm#2, #1 mov v1.h[2], w9 ubfx w9, w1, llvm#3, #1 mov v0.h[2], w8 ubfx w8, w1, llvm#7, #1 mov v1.h[3], w9 mov v0.h[3], w8 ushll v1.4s, v1.4h, #0 ushll v0.4s, v0.4h, #0 shl v1.4s, v1.4s, llvm#31 shl v0.4s, v0.4s, llvm#31 cmlt v1.4s, v1.4s, #0 cmlt v0.4s, v0.4s, #0 bsl v1.16b, v4.16b, v5.16b bsl v0.16b, v3.16b, v2.16b stp q1, q0, [x0] add sp, sp, llvm#16 ret ``` With this PR that instead emits ```asm if_then_else8: adrp x8, .LCPI0_1 dup v0.4s, w1 ldr q1, [x8, :lo12:.LCPI0_1] adrp x8, .LCPI0_0 ldr q2, [x8, :lo12:.LCPI0_0] ldp q4, q3, [x2] and v1.16b, v0.16b, v1.16b and v0.16b, v0.16b, v2.16b ldp q5, q2, [x3] cmeq v1.4s, v1.4s, #0 cmeq v0.4s, v0.4s, #0 bsl v1.16b, v2.16b, v3.16b bsl v0.16b, v5.16b, v4.16b stp q0, q1, [x0] ret ``` So substantially shorter. Instead of building the mask element-by-element, this approach (by virtue of not splitting) instead splats the mask value into all vector lanes, performs a bitwise and with powers of 2, and compares with zero to construct the mask vector. cc rust-lang/rust#122376 cc llvm#175769

llvm#184186) …83889)" This reverts commit 2342db0. Revert "[CMake] Propagate dependencies to OBJECT libraries in `add_llvm_library` (llvm#183541)" This reverts commit e3c0454.

Update README.md

d872e6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update README.md#1

Update README.md#1
Endilll wants to merge 1 commit intomainfrom
ruleset-test

Endilll commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Endilll commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant