Skip to content

dispatch! is expensive #194

@Shnatsel

Description

@Shnatsel

In PhastFT for smaller sizes I'm calling dispatch! three times when running an FFT operation on 512 bytes of data (64-long batch of f64) and it is degrading performance by 25% (-20% throughput) measured as of commit https://github.com/QuState/PhastFT/tree/e5fcd61f3d540fcef9f8d60173dbfbe777c02e40

Meanwhile RustFFT with its handwritten dispatch does not suffer any penalty at all, and in fact is slightly slower under -C target-cpu=native than it is under its regular dynamic dispatch.

This overhead needs to be removed for code based on fearless_simd to be competitive with handwritten dynamic dispatch.

perf diff and profiling with samply both point to these dispatch! calls as a major source of slowdown: https://github.com/QuState/PhastFT/blob/c7ea3d7aef474e53233834354364fa50bbb0ba6e/src/algorithms/dit.rs#L259-L260
Profile with -C target-cpu=x86-64-v3: https://share.firefox.dev/3LMqjuI
Profile with dynamic dispatch: https://share.firefox.dev/3NTwJZw

I'm not sure what the cause is. I wouldn't expect a handful of perfectly predictable branches to tank performance. Perhaps dispatch! results in subotimal codegen, or perhaps I'm just pushing the boundaries of dynamic dispatch and need a facility to get a function pointer and store it in a struct for reuse instead of just reusing a cached Level.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions