Adaptive FIR Filters (LMS, NLMS, Block LMS, Normalized Block LMS)

This repository implements single-precision adaptive FIR filters for real and complex analytic signals with SIMD acceleration (AVX2/AVX-512/NEON), optional CUDA GPU support, and OptMathKernels integration for Raspberry Pi 5.

Features

Algorithms: LMS, NLMS, Block LMS, Normalized Block LMS
Signal Types: Real (float) and complex (std::complex<float>)
FFT Acceleration: Automatic overlap-save convolution for filters > 32 taps via FFTW3
SIMD Optimization: AVX2+FMA and AVX-512 (x86_64), ARM NEON (Raspberry Pi 4/5)
OptMathKernels Integration: Enhanced NEON kernels via OptMathKernels
GPU Acceleration: Optional NVIDIA CUDA support via cuFFT
FFTW Wisdom Caching: Persistent FFT plan optimization in ~/.adapt_fftw_wisdom
Parameter Validation: Runtime checks on all configuration parameters
Thread Safety: Mutex-protected FFTW plan creation for multi-threaded use
GNU Radio Integration: Ready-to-use sync_block wrappers
Comprehensive Test Suite: 22 tests covering all algorithms, types, edge cases, and numerical properties

Supported Platforms

Platform	SIMD	GPU	OptMathKernels	Notes
x86_64 Linux	AVX2+FMA, AVX-512	CUDA	-	Full optimization
Raspberry Pi 5	NEON	-	Supported	Cortex-A76 tuned
Raspberry Pi 4	NEON	-	Supported	Cortex-A72 tuned
Generic ARM64	NEON	-	Supported	ARMv8-A baseline

Algorithms

LMS - Least Mean Squares (sample-wise weight update)
NLMS - Normalized LMS (scale-invariant, regularized adaptation)
Block LMS - Block-based update (weights updated every M samples)
Normalized Block LMS - Block NLMS with per-block energy normalization

For block algorithms with filter length > 32, the implementation automatically uses an FFT overlap-save path backed by FFTW3 single precision (fftw3f). FFT sizes are chosen with small prime factors (2, 3, 5, 7) for optimal FFTW performance. For filter length <= 32, a time-domain block path is used.

Complex Convention

Filtering uses the conjugate-weights convention:

$$y[n] = \sum_{k=0}^{M-1} \overline{w[k]} \cdot x[n-k]$$

Time-domain LMS/NLMS update:

$$w[k] \leftarrow w[k] + \mu \cdot \frac{\overline{e[n]} \cdot x[n-k]}{|x|^2 + \varepsilon}$$

The FFT block update is implemented to be algebraically consistent with these conventions (including conjugations).

Installation

Dependencies

Required:

C++20 compiler (GCC 10+, Clang 12+)
CMake 3.18+
FFTW3 single-precision (libfftw3-dev or fftw3f)

Optional:

FFTW3 threads (libfftw3-dev includes this on most systems)
NVIDIA CUDA Toolkit 11.0+ (for GPU acceleration)
OptMathKernels (for enhanced ARM NEON performance)

Build (Linux - x86_64)

# Install dependencies
sudo apt update
sudo apt install -y build-essential cmake pkg-config libfftw3-dev

# Build with default options (AVX2 auto-detected)
mkdir -p build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

# Run tests
ctest --test-dir build

Build (Raspberry Pi 5)

sudo apt update
sudo apt install -y build-essential cmake pkg-config libfftw3-dev

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DADAPT_TARGET_PI5=ON
cmake --build build -j4
ctest --test-dir build

Build (Raspberry Pi 4)

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DADAPT_TARGET_PI4=ON
cmake --build build -j4

Build with OptMathKernels (ARM)

# First build and install OptMathKernels
cd /path/to/OptimizedKernelsForRaspberryPi5_NvidiaCUDA
mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DOPTMATH_USE_NEON=ON \
      -DCMAKE_INSTALL_PREFIX=$(pwd)/../install
make -j4 && make install

# Then build AdaptiveFiltering with OptMathKernels
cd /path/to/AdaptiveFiltering
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
      -DADAPT_TARGET_PI5=ON \
      -DADAPT_USE_OPTMATH=ON \
      -DADAPT_OPTMATH_PATH=/path/to/OptimizedKernelsForRaspberryPi5_NvidiaCUDA/install
cmake --build build -j4

Build with CUDA (x86_64 + NVIDIA GPU)

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release \
      -DADAPT_USE_CUDA=ON \
      -DADAPT_CUDA_ARCH=86  # Adjust for your GPU (75=Turing, 86=Ampere, 89=Ada)
cmake --build build -j$(nproc)

Build with AVX-512 (x86_64)

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DADAPT_USE_AVX512=ON
cmake --build build -j$(nproc)

CMake Options

Option	Default	Description
`ADAPT_FILTERS_BUILD_TESTS`	`ON`	Build unit tests
`ADAPT_FILTERS_BUILD_EXAMPLES`	`ON`	Build example programs
`ADAPT_FILTERS_BUILD_BENCHMARKS`	`OFF`	Build benchmark suite
`ADAPT_USE_AVX2`	`ON`	Enable AVX2+FMA optimizations (x86_64)
`ADAPT_USE_AVX512`	`OFF`	Enable AVX-512 optimizations (x86_64)
`ADAPT_USE_NEON`	`ON`	Enable ARM NEON optimizations
`ADAPT_TARGET_PI5`	`OFF`	Optimize for Raspberry Pi 5 (Cortex-A76)
`ADAPT_TARGET_PI4`	`OFF`	Optimize for Raspberry Pi 4 (Cortex-A72)
`ADAPT_USE_CUDA`	`OFF`	Enable NVIDIA CUDA acceleration
`ADAPT_CUDA_ARCH`	`75`	CUDA compute capability
`ADAPT_USE_OPTMATH`	`OFF`	Enable OptMathKernels integration
`ADAPT_OPTMATH_PATH`	`""`	Path to OptMathKernels installation

Usage

Basic Example

#include <adapt/adaptive_fir.hpp>
#include <vector>

int main() {
    // Create a 64-tap NLMS filter
    adapt::Params params;
    params.mu = 0.01f;   // Step size (must be >= 0)
    params.eps = 1e-6f;  // Regularization (must be > 0)

    adapt::AdaptiveFIR<float> filter(64, adapt::Algorithm::NLMS, params);

    // Process signals
    std::vector<float> input(1024);    // Input signal
    std::vector<float> desired(1024);  // Desired/reference signal
    std::vector<float> output(1024);   // Filter output

    // ... fill input and desired ...

    filter.process(
        adapt::Span<const float>(input.data(), input.size()),
        adapt::Span<const float>(desired.data(), desired.size()),
        adapt::Span<float>(output.data(), output.size())
    );

    // Get adapted weights
    const auto& weights = filter.weights();

    return 0;
}

Complex Signals

#include <adapt/adaptive_fir.hpp>
#include <complex>

using cf32 = std::complex<float>;

adapt::AdaptiveFIR<cf32> filter(128, adapt::Algorithm::BLOCK_NLMS);

std::vector<cf32> x(4096), d(4096), y(4096);
// ... fill x and d ...

filter.process(
    adapt::Span<const cf32>(x.data(), x.size()),
    adapt::Span<const cf32>(d.data(), d.size()),
    adapt::Span<cf32>(y.data(), y.size())
);

Runtime Parameter Updates

// Parameters can be changed at runtime (validated on set)
filter.set_mu(0.005f);   // Reduce step size (throws if < 0)
filter.set_eps(1e-8f);   // Adjust regularization (throws if <= 0)

// Reset filter to initial state (clears weights, history, and FFT overlap)
filter.reset_state();

// Set specific weights
std::vector<float> new_weights(64, 0.0f);
new_weights[0] = 1.0f;
filter.set_weights(new_weights);

CUDA-Accelerated Filter

#ifdef ADAPT_HAVE_CUDA
#include <adapt/cuda/adaptive_fir_cuda.hpp>

// GPU-accelerated filter (automatically falls back to CPU for small filters)
adapt::cuda::AdaptiveFIRCuda<float> gpu_filter(256, adapt::Algorithm::BLOCK_LMS);

// Check if GPU is being used
if (gpu_filter.is_using_gpu()) {
    std::cout << "Using GPU acceleration\n";
}

// Same API as CPU version
gpu_filter.process(x_span, d_span, y_span);
#endif

API Reference

`AdaptiveFIR<T>` Class

template <typename T>  // T = float or std::complex<float>
class AdaptiveFIR {
public:
    // Constructor - throws on invalid params (filter_len==0, mu<0, eps<=0, max_nfft==0)
    AdaptiveFIR(std::size_t filter_len, Algorithm alg, Params p = {});

    // Properties
    std::size_t filter_len() const;
    Algorithm algorithm() const;
    const std::vector<T>& weights() const;

    // Parameters (validated on set - throws std::runtime_error on invalid values)
    float mu() const;
    float eps() const;
    void set_mu(float mu);    // throws if mu < 0
    void set_eps(float eps);  // throws if eps <= 0

    // Weight management
    void set_weights(const std::vector<T>& w);  // throws if w.size() != filter_len
    void reset_state();  // resets weights to zero, clears history and FFT state

    // Main processing function
    // x: input signal, d: desired/reference signal, y_out: optional filter output
    // throws if x.size() != d.size() or y_out non-empty with wrong size
    void process(Span<const T> x, Span<const T> d, Span<T> y_out = {});
};

`Algorithm` Enum

enum class Algorithm {
    LMS,        // Sample-wise LMS
    NLMS,       // Sample-wise Normalized LMS
    BLOCK_LMS,  // Block LMS (time-domain for M<=32, FFT for M>32)
    BLOCK_NLMS  // Block Normalized LMS
};

`Params` Struct

struct Params {
    float mu = 0.01f;              // Step size (must be >= 0)
    float eps = 1e-6f;             // Regularization (must be > 0)
    std::size_t max_nfft = 65536;  // Maximum FFT size cap (must be > 0)

    void validate() const;  // throws std::runtime_error on invalid values
};

Performance

SIMD Acceleration

The library includes hand-optimized SIMD kernels for critical inner-loop operations:

Operation	x86_64 AVX2	x86_64 AVX-512	ARM NEON	Generic
Real dot product	8 floats/iter	16 floats/iter	4 floats/iter	1 float/iter
Complex multiply	4 complex/iter	8 complex/iter	2 complex/iter	1 complex/iter
Weight update (FMA)	8 floats/iter	16 floats/iter	4 floats/iter	1 float/iter
FFT freq-domain ops	Vectorized	Vectorized	Vectorized	Scalar

When OptMathKernels is enabled on ARM, the NEON kernels are replaced with further-optimized implementations from that library for fmac, scale, and complex multiply-accumulate operations.

Kernel Dispatch Priority

The kernel dispatcher selects the best available backend at compile time:

OptMathKernels (if ADAPT_USE_OPTMATH=ON and ARM NEON available)
AVX-512 (if ADAPT_USE_AVX512=ON and CPU supports it)
AVX2+FMA (if ADAPT_USE_AVX2=ON and CPU supports it)
ARM NEON (always available on AArch64)
Generic (portable C++ fallback)

When to Use Each Algorithm

Scenario	Recommended Algorithm
Real-time, sample-by-sample	`LMS` or `NLMS`
Unknown signal scaling	`NLMS` or `BLOCK_NLMS`
Large filter (>32 taps)	`BLOCK_LMS` or `BLOCK_NLMS`
High throughput, batched	`BLOCK_LMS` or `BLOCK_NLMS`
GPU available, very large filter	CUDA variant with `BLOCK_*`

Project Structure

AdaptiveFiltering/
├── include/adapt/
│   ├── adaptive_fir.hpp        # Main adaptive filter class
│   ├── span.hpp                # Minimal span utility
│   ├── traits.hpp              # Type traits (is_complex, scalar_type, conj_if_needed, abs2)
│   ├── smooth_fft.hpp          # FFT size selection (2,3,5,7-smooth numbers)
│   ├── fftw_wrap.hpp           # FFTW3 RAII wrapper with wisdom caching
│   ├── kernels/
│   │   ├── kernels.hpp         # Unified kernel dispatcher
│   │   ├── kernel_config.hpp   # CPU feature detection (CPUID / NEON)
│   │   ├── kernel_generic.hpp  # Portable C++ fallback kernels
│   │   ├── kernel_avx2.hpp     # AVX2+FMA optimized kernels
│   │   ├── kernel_avx512.hpp   # AVX-512 optimized kernels
│   │   ├── kernel_neon.hpp     # ARM NEON optimized kernels
│   │   └── kernel_optmath.hpp  # OptMathKernels bridge (enhanced NEON)
│   └── cuda/
│       ├── cuda_fft.hpp        # cuFFT wrapper
│       └── adaptive_fir_cuda.hpp  # GPU-accelerated filter
├── src/
│   ├── adaptive_fir.cpp        # Template instantiation for linkage
│   ├── fftw_wrap.cpp           # FFTW implementation with mutex + wisdom
│   └── cuda/                   # CUDA kernel implementations
├── tests/                      # 22 unit tests (see Testing section)
├── examples/
│   ├── example_system_identification.cpp
│   └── example_noise_canceller.cpp
├── gnuradio_wrappers/          # GNU Radio sync_block wrappers
├── cmake/
│   └── adapt_config.hpp.in     # Generated configuration header
└── CMakeLists.txt

GNU Radio Integration

GNU Radio-compatible sync_block wrappers are included:

adaptive_fir_ff - Float input/output
adaptive_fir_cc - Complex input/output

Located in gnuradio_wrappers/. These are designed to be dropped into a GNU Radio OOT module and linked against adapt_filters.

Features:

Two input ports: signal (x) and reference (d)
Selectable output: filtered signal (y) or error (e)
Runtime parameter updates via setters
Algorithm switching while preserving weights

Thread Safety

FFTW plan creation/destruction: Protected by mutex (thread-safe)
FFT execution: Thread-safe (same plan can be used from multiple threads with different data)
Filter instances: NOT thread-safe (use separate instances per thread)
FFTW wisdom: Loaded once at startup, saved on plan creation

Examples

Two example programs are included:

example_system_identification - Identifies an unknown 96-tap system using Block NLMS with FFT acceleration
example_noise_canceller - Complex-valued adaptive noise cancellation with a 48-tap NLMS filter

Build and run:

./build/example_system_identification
./build/example_noise_canceller

Testing

Test Suite Overview

The test suite contains 22 tests organized into categories:

Core Algorithm Tests (5)

Test	Description
`test_lms_system_id`	LMS convergence on system identification
`test_nlms_scale_invariance`	NLMS robustness to signal scaling
`test_block_fft_matches_direct`	FFT overlap-save matches time-domain
`test_block_nlms_complex_converges`	Complex Block NLMS convergence
`test_lms_complex`	Complex LMS: system ID, channel tracking, output accuracy

Algorithm Coverage Tests (3)

Test	Description
`test_all_algorithms_float`	All 4 algorithms x multiple filter lengths (float) with NMSE thresholds
`test_all_algorithms_complex`	All 4 algorithms x multiple filter lengths (complex) with NMSE thresholds
`test_large_filters`	M=256, M=512 filters (FFT path stress test)

Robustness Tests (4)

Test	Description
`test_param_validation`	12 checks: invalid mu, eps, sizes, empty input
`test_edge_cases`	M=1, M=2, single sample, M=32/33 threshold boundary
`test_numerical_stability`	Large/small amplitude, zero input, impulse, alternating signals
`test_convergence_rate`	mu comparison, NLMS vs LMS, monotonic improvement

Functional Tests (4)

Test	Description
`test_weight_management`	10 checks: init, get/set, FFT sync, reset, frozen weights
`test_streaming_consistency`	Chunked vs monolithic processing equivalence
`test_fft_path_correctness`	Frozen convolution accuracy at M=64/100/128, partial blocks
`test_noise_cancellation`	Float ANC, complex ANC, echo cancellation with ERLE measurement

Infrastructure Tests (6)

Test	Description
`test_simd_kernels`	All SIMD kernel ops (dot, fmac, mul, conj_mul)
`test_kernel_consistency`	Optimized kernels vs generic reference across 27 sizes
`test_smooth_fft`	FFT size selection: smoothness, minimality, constraints
`test_fftw_wrap`	FFTW roundtrip, Parseval's theorem, DFT accuracy, move semantics
`test_span`	Span utility: constructors, access, const, zero-size
`test_traits`	Type traits: is_complex, scalar_type, conj_if_needed, abs2

Running Tests

# Run all tests
ctest --test-dir build

# Run with verbose output
ctest --test-dir build -V

# Run specific test
./build/test_noise_cancellation

Test Results

All 22 tests pass on Raspberry Pi 5 (Cortex-A76, ARM NEON, OptMathKernels enabled):

 1/22 test_lms_system_id ..................   Passed    0.00 sec
 2/22 test_nlms_scale_invariance ..........   Passed    0.00 sec
 3/22 test_block_fft_matches_direct .......   Passed    0.00 sec
 4/22 test_block_nlms_complex_converges ...   Passed    0.02 sec
 5/22 test_simd_kernels ...................   Passed    0.00 sec
 6/22 test_param_validation ...............   Passed    0.00 sec
 7/22 test_edge_cases .....................   Passed    0.01 sec
 8/22 test_all_algorithms_float ...........   Passed    0.06 sec
 9/22 test_all_algorithms_complex .........   Passed    0.10 sec
10/22 test_large_filters ..................   Passed    0.22 sec
11/22 test_weight_management ..............   Passed    0.00 sec
12/22 test_streaming_consistency ..........   Passed    0.00 sec
13/22 test_numerical_stability ............   Passed    0.01 sec
14/22 test_convergence_rate ...............   Passed    0.01 sec
15/22 test_fft_path_correctness ...........   Passed    0.01 sec
16/22 test_noise_cancellation .............   Passed    0.03 sec
17/22 test_smooth_fft .....................   Passed    0.00 sec
18/22 test_fftw_wrap ......................   Passed    0.01 sec
19/22 test_kernel_consistency .............   Passed    0.01 sec
20/22 test_lms_complex ....................   Passed    0.01 sec
21/22 test_span ...........................   Passed    0.00 sec
22/22 test_traits .........................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 22
Total Test time (real) =   0.54 sec

Detailed test output (click to expand)

test_simd_kernels:
  dot_product_f32: PASS
  dot_product_cf32: PASS
  sum_squares_f32: PASS
  sum_norm_cf32: PASS
  fmac_f32: PASS
  mul_cf32: PASS
  conj_mul_cf32: PASS

test_param_validation:
  filter_len=0: PASS
  negative mu: PASS
  eps=0: PASS
  negative eps: PASS
  max_nfft=0: PASS
  set_mu negative: PASS
  set_eps zero: PASS
  set_weights mismatch: PASS
  x/d size mismatch: PASS
  y_out size mismatch: PASS
  mu=0 valid: PASS
  empty input: PASS

test_edge_cases:
  M=1 LMS: PASS
  M=1 complex NLMS: PASS
  M=2 LMS: PASS
  single sample processing: PASS
  M=33 Block NLMS FFT path: PASS
  M=32 Block LMS time domain: PASS

test_all_algorithms_float:
  LMS M=8: PASS (NMSE=0.0000 < 0.15)
  LMS M=16: PASS (NMSE=0.0000 < 0.20)
  NLMS M=8: PASS (NMSE=0.0000 < 0.10)
  NLMS M=16: PASS (NMSE=0.0001 < 0.15)
  NLMS M=32: PASS (NMSE=0.0001 < 0.20)
  BLOCK_LMS M=16: PASS (NMSE=0.0001 < 0.25)
  BLOCK_LMS M=32: PASS (NMSE=0.0001 < 0.30)
  BLOCK_NLMS M=16: PASS (NMSE=0.0000 < 0.20)
  BLOCK_NLMS M=32: PASS (NMSE=0.0006 < 0.25)
  BLOCK_LMS M=64: PASS (NMSE=0.0000 < 0.35)
  BLOCK_NLMS M=64: PASS (NMSE=0.0001 < 0.30)
  BLOCK_NLMS M=128: PASS (NMSE=0.0006 < 0.35)

test_all_algorithms_complex:
  LMS M=8: PASS (NMSE=0.0000 < 0.20)
  LMS M=16: PASS (NMSE=0.0000 < 0.25)
  NLMS M=8: PASS (NMSE=0.0002 < 0.15)
  NLMS M=16: PASS (NMSE=0.0001 < 0.20)
  NLMS M=32: PASS (NMSE=0.0001 < 0.25)
  BLOCK_LMS M=16: PASS (NMSE=0.0001 < 0.30)
  BLOCK_LMS M=64: PASS (NMSE=0.0000 < 0.35)
  BLOCK_NLMS M=16: PASS (NMSE=0.0000 < 0.25)
  BLOCK_NLMS M=64: PASS (NMSE=0.0002 < 0.30)
  BLOCK_NLMS M=96: PASS (NMSE=0.0001 < 0.35)

test_large_filters:
  M=256 float BLOCK_NLMS: PASS (NMSE=0.0006)
  M=256 complex BLOCK_NLMS: PASS (NMSE=0.0001)
  M=512 float BLOCK_NLMS: PASS (NMSE=0.0001)

test_weight_management:
  initial weights zero: PASS
  set/get weights roundtrip: PASS
  set_weights FFT path: PASS
  reset_state clears weights: PASS
  frozen weights (mu=0): PASS
  complex frozen weights: PASS
  complex set_weights FFT roundtrip: PASS
  filter_len accessor: PASS
  algorithm accessor: PASS
  mu/eps accessors: PASS

test_streaming_consistency:
  LMS float chunk=1: PASS
  LMS float chunk=7: PASS
  LMS float chunk=50: PASS
  NLMS float chunk=1: PASS
  NLMS float chunk=13: PASS
  LMS complex chunk=1: PASS
  NLMS complex chunk=11: PASS
  BLOCK_NLMS float chunk=100 vs 500: PASS

test_numerical_stability:
  large amplitude NLMS: PASS
  small amplitude NLMS: PASS
  zero input signal: PASS
  impulse response: PASS
  complex large amplitude: PASS
  alternating amplitude: PASS
  block FFT large amplitude complex: PASS

test_convergence_rate:
  NLMS higher mu converges faster: (mu=0.05: 0.000005, mu=0.5: 0.000129) PASS
  NLMS converges faster than LMS: PASS (LMS: 0.0000, NLMS: 0.0001)
  monotonic improvement: PASS
  more data better convergence: PASS (N=200: 0.7409, N=10000: 0.0000)

test_fft_path_correctness:
  float M=64 FFT frozen convolution: PASS (max_err=0.000000)
  complex M=64 FFT frozen convolution: PASS (max_err=0.000000)
  float M=128 FFT frozen convolution: PASS (max_err=0.000000)
  float M=100 FFT frozen convolution: PASS (max_err=0.000000)
  partial block (N < Nfft): PASS (max_err=0.000000)
  multiple partial blocks: PASS (max_err=0.000000)

test_noise_cancellation:
  float ANC (correlated noise removal): PASS (SNR: -7.9 -> 4.8)
  complex ANC: PASS (steady-state SNR: 6.8 dB)
  echo cancellation scenario: PASS (ERLE: 59.1 dB)

test_kernel_consistency:
  dot_product_f32 vs generic: PASS
  sum_squares_f32 vs generic: PASS
  fmac_f32 vs generic: PASS
  dot_product_cf32 vs generic: PASS
  sum_norm_cf32 vs generic: PASS
  mul_cf32 vs generic: PASS
  conj_mul_cf32 vs generic: PASS
  scale_inplace_cf32 vs generic: PASS
  fmac_cf32 vs generic: PASS
  template dispatchers: PASS
  CPU feature detection: PASS

test_lms_complex:
  complex LMS sysid M=16: PASS (NMSE=0.0000)
  complex NLMS tracking: PASS (NMSE=0.0003 tracking h2)
  complex LMS output accuracy: PASS

test_fftw_wrap:
  forward-inverse roundtrip: PASS
  Parseval's theorem: PASS
  DC signal DFT: PASS
  single frequency DFT: PASS
  various FFT sizes: PASS
  non-power-of-2 size: PASS
  move semantics: PASS

test_smooth_fft:
  basic selection: PASS
  powers of 2: PASS
  non-smooth roundup: PASS
  all results smooth: PASS
  result is minimal: PASS
  max_n constraint: PASS
  large numbers: PASS
  target=0: PASS

test_span:
  default constructor: PASS
  pointer+size constructor: PASS
  element access: PASS
  const span: PASS
  span from vector: PASS
  zero-size span: PASS

test_traits:
  is_complex: PASS
  scalar_type: PASS
  conj_if_needed: PASS
  abs2: PASS

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please ensure:

All tests pass (ctest --test-dir build)
New features include appropriate tests
Code follows existing style conventions

Acknowledgments

FFTW3 for high-performance FFT
OptMathKernels for optimized ARM NEON kernels
NVIDIA for cuFFT and CUDA toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cmake		cmake
docs/diagrams		docs/diagrams
examples		examples
gnuradio_wrappers		gnuradio_wrappers
include/adapt		include/adapt
src		src
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
requirements.md		requirements.md

License

n4hy/AdaptiveFiltering

Folders and files

Latest commit

History

Repository files navigation

Adaptive FIR Filters (LMS, NLMS, Block LMS, Normalized Block LMS)

Features

Supported Platforms

Algorithms

Complex Convention

Installation

Dependencies

Build (Linux - x86_64)

Build (Raspberry Pi 5)

Build (Raspberry Pi 4)

Build with OptMathKernels (ARM)

Build with CUDA (x86_64 + NVIDIA GPU)

Build with AVX-512 (x86_64)

CMake Options

Usage

Basic Example

Complex Signals

Runtime Parameter Updates

CUDA-Accelerated Filter

API Reference

AdaptiveFIR<T> Class

Algorithm Enum

Params Struct

Performance

SIMD Acceleration

Kernel Dispatch Priority

When to Use Each Algorithm

Project Structure

GNU Radio Integration

Features:

Thread Safety

Examples

Testing

Test Suite Overview

Core Algorithm Tests (5)

Algorithm Coverage Tests (3)

Robustness Tests (4)

Functional Tests (4)

Infrastructure Tests (6)

Running Tests

Test Results

License

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`AdaptiveFIR<T>` Class

`Algorithm` Enum

`Params` Struct

Packages