Skip to content

functions-lab/Nexus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

NEXUS

Please check out our paper NEXUS.

NEXUS is a real-time multi-cell baseband processing framework built on top of Savannah. Here lists the major differeces:

  • NEXUS integrates both software-based and hardware-accelerated baseband processing pipelines, and introduces a novel design for sharing Intel's ACC100 LDPC accelerator across multiple cores via virtual functions (VFs).
  • NEXUS formulates a power-aware resource allocation strategy for single-cell deployments, and then extends it to multi-cell scenarios by modeling latency contention under concurrent execution.
  • NEXUS features multi-cell baseband processing with heterogeneous cell configurations and heterogeneous resource allocation strategies.

Requirement

NEXUS requires the following hardware components and software setup.

  • Same operating system setup as Agora.
  • Intel ACC100 accelerator card (Unit test tutorial).
  • Configure ACC100 properly for multile virtual functions (VFs tutorial).
  • AVX-512 supported Intel Xeon CPUs for optimal performance.

Configurations

Most of the architectural options of Savannah are set with CMake for compilation, while the communication parameters can be adjusted in .json config files.

CMake Options

CMake options select the thread models, arithmetic libraries, LDPC decoder, etc. Below are Savannah-related variables and their available and desired values.

CMake Variables Available Options
TIME_EXCLUSIVE True , False
LDPC_TYPE FlexRAN, ACC100
GENDATA_ENCODE FlexRAN, ACC100
MAT_OP_TYPE ARMA_CUBE
ARMA_VEC
AVX512

In <nexus folder>/build, use cmake .. -D<VAR>=<OPTION> to configure.

  • TIME_EXCLUSIVE should always be true to ensure the best performance by avoiding unnecessary recording.
  • LDPC_TYPE allows users to select the LDPC decoder: FlexRAN (software) vs. ACC100 (hardware).
  • GENDATA_ENCODE allows users to select how to generate LDPC encoded bits: FlexRAN (software) vs. ACC100 (hardware).
  • LDPC_ENQ_BULK determines whether OFDM symbols are delayed to push into the ACC100 accelerator at the last uplink symbol for a frame.
  • MAT_OP_TYPE selects the compute scheme for vectorized matrix operations. Only effective when small_mimo_acc is enabled in .json config. AVX512 is always recommended for performance if supported. ARMA_VEC is the vectorized option wrapped by Armadillo, and thus is recommended when AVX512 is unavailable. ARMA_CUBE is for research/experiment evaluation only and does not have good performance as expected.

JSON Options

We provide many JSON scripts for emulated RU locates in <nexus folder>/files/config/ci/.

In addition to Savannah, we provide several more parameters as described in the following:

CMake Variables Available Options
solution sc, s1, s2, s3, s4
bs_rru_port Socket port to attach gNodeB and UE
bs_server_port Socket port to attach gNodeB and UE
acc100_addr_1 ACC100 address
bbdev_id_1 Baseband Device ID

Specifically, the following table summarizes the mapping between solution to allocation strategies described in our paper NEXUS - $\theta(C^{dsp}, C^{acc} | C, V)$:

Soluiton Option Allocation Strategy
sc $\theta$(1,1,|1,1)
s1 $\theta$(c,1,|c,1)
s2 $\theta$(c-1,1,|c,1)
s3 $\theta$(c,c,|c,c)
s4 $\theta$(c,1,|c,v), here V can ragne from 1-16

The configuration is a JSON script that is related. We can set up MIMO dimension (base_radio_num/ue_radio_num), FFT size (fft_size), number of data subcarriers (ofdm_data_num), modulation scheme (modulation), LDPC code rate (code_rate), and sampling rate (sample_rate).

Run with Emulated-RU

Run the gNodeB in one terminal and the UE in the other terminal. You can add more cells by appending the corresponding JSON config files

$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}  ./build/agora <Cell-1> <Cell-2> <Cell-3> ... <Cell-N> 
# UE: -u: UE, -s: simulation mode
$ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH}  ./build/sender --num_threads=<X> --core_offset=<X> --enable_slow_start=1 <Cell-1> <Cell-2> <Cell-3> ... <Cell-N>

Savannah

Savannah is a real-time baseband processing framework built on top of Agora. Here are the major differences:

  • Savannah features a small MIMO dimension in 5G FR2 when Agora features massive MIMO in 5G FR1.
  • Savannah introduces ACC100 accelerator for LDPC decoding and single-core processing (Savannah-sc) while Agora features a multi-thread model with FlexRAN decoder.
  • Savannah adopts vectorized matrix operation on top of Agora's multi-thread model (Savannah-mc).

Requirement

Savannah requires the following hardware components and software setup.

  • Same operating system setup as Agora.
  • Intel ACC100 accelerator card (Unit test tutorial).
  • AVX-512 supported Intel Xeon CPUs for optimal performance.

For over-the-air (OTA) evaluation, radio front-ends are required. We recommend using the COSMOS Testbed for the IBM 28 GHz phased array antenna module (PAAM). For a local testbed, we recommend using the USRP X310 SDR to set up FR2 parameters with sub-7 GHz antennas for radio frequency.

Configurations

Most of the architectural options of Savannah are set with CMake for compilation, while the communication parameters can be adjusted in .json config files.

CMake Options

CMake options select the thread models, arithmetic libraries, LDPC decoder, etc. Below are Savannah-related variables and their available and desired values.

CMake Variables Available Options Savannah-sc Savannah-mc
TIME_EXCLUSIVE True , False True True
LDPC_TYPE FlexRAN, ACC100 ACC100 FlexRAN
LDPC_ENQ_BULK True, False False False
MAT_OP_TYPE ARMA_CUBE
ARMA_VEC
AVX512
AVX512 AVX512
SINGLE_THREAD True, False True False

In <savannah folder>/build, use cmake .. -D<VAR>=<OPTION> to configure.

  • TIME_EXCLUSIVE should always be true to ensure the best performance by avoiding unnecessary recording.
  • LDPC_TYPE allows users to select the LDPC decoder: FlexRAN (software) vs. ACC100 (hardware).
  • LDPC_ENQ_BULK determines whether OFDM symbols are delayed to push into the ACC100 accelerator at the last uplink symbol for a frame.
  • MAT_OP_TYPE selects the compute scheme for vectorized matrix operations. Only effective when small_mimo_acc is enabled in .json config. AVX512 is always recommended for performance if supported. ARMA_VEC is the vectorized option wrapped by Armadillo, and thus is recommended when AVX512 is unavailable. ARMA_CUBE is for research/experiment evaluation only and does not have good performance as expected.
  • SINGLE_THREAD defines if Savannah merges the only worker thread and the main scheduling thread. When it is set to True, the worker thread count is limited to 1 and is merged with the main thread. When it is set to False, Savannah acts as a multi-thread model as Agora, and has a total thread count of worker threads + 1 (scheduling, main thread).

JSON Options

The default JSON script for emulated RU is located in <savannah folder>/files/config/ci/tddconfig-sim-ul-fr2.json, and for real RU is located in <savannah folder>/files/config/examples/ul-usrp.json.

To enable vectorized matrix operation, set small_mimo_acc to true. Note that when "small_mimo_acc": true, the beam_block_size field is neglected. Also, note that worker_thread_num must be 1 when SINGLE_THREAD=True in CMake for Savannah-sc; otherwise, it can be any positive integer.

The configuration is json script that is related. We can set up MIMO dimension (base_radio_num/ue_radio_num), FFT size (fft_size), number of data subcarriers (ofdm_data_num), modulation scheme (modulation), LDPC code rate (code_rate), and sampling rate (sample_rate).

Run Script

We provide savannah.sh as an unified control to compile, generate data, and run Savannah, in both emulated-RU mode (simulation) and real-RU mode (RRU).

Here we lists the usage to run Savannah-sc and Savannah-mc in emulated-RU mode. Please use savannah.sh -h to read for more details.

Run with Emulated-RU

Run the base station (BS, main program) in one terminal and the UE in the other terminal.

# BS: -x: execution, -s: simulation mode, -r: root privilege, required for ACC100
$ savannah.sh -x -s -r
# UE: -u: UE, -s: simulation mode
$ savannah.sh -u -s

Run with Real-RU

Run the base station (BS, main program) in one terminal and the UE in the other terminal.

# BS: -x: execution, -r: rru mode, -r: root privilege, required for ACC100
$ savannah.sh -x -r -r
# UE: -u: UE, -s: rru mode
$ savannah.sh -u -r

Output

savannah.sh will automatically record stdout to log/ and marked filename with a timestamp. Please create <savannah folder>/log/ folder for this.


Agora

Agora is a complete software realization of real-time massive MIMO baseband processing.

Some highlights:

  • Agora currently supports 64x16 MU-MIMO (64 RRU antennas and 16 UEs) with 20 MHz bandwidth and 64QAM modulation, on a 36-core server with AVX512 support.
  • Agora is configurable in terms of numbers of RRU antennas and UEs, bandwidth, modulation orders, LDPC code rates.
  • Agora supports an emulated RRU and UEs with a high-performance packet generator.
  • Agora has been tested with real RRUs with up to 64 antennas and up to 8 UEs. The RRU and UE devices are available from Skylark Wireless.

Contents

Building Agora

Agora currently only builds and runs on Linux, and has been tested on Ubuntu 16.04, 18.04, 20.04 (Recommended), and 22.04 LTS. Agora requires CMake 2.8+ and works with both GNU and Intel compilers with C++17 support.

Setting up the build environment

  • Setup CI: run

    $ ./config_ci.sh
    
    • Note for developers: You must run this command before checking out your new feature branch. Do not use _ in your branch name. Use - instead.
  • See scripts/ubuntu.sh for required packages, including Linux packages, gtest, Armadillo, and SoapySDR, and the corresponding versions. Run ./scripts/ubuntu.sh to install these packages.

  • Download and install Intel libraries:

    • Install Intel compiler and MKL, refer to INTELLIB_README.md.

    • Set required environment variables by sourcing setvars.sh. If oneAPI is installed in /opt, run source /opt/intel/oneapi/setvars.sh.

    • Install Intel FlexRAN's FEC SDK for LDPC encoding and decoding:

      • NOTE: Compiling FlexRAN requires Intel compiler with version <= 19.0.4. Newer versions of Intel compiler can also work, but require a patch for resolving conflicts with FlexRAN.
        Please contact the current Agora developers to get the patch.
      • Download Intel FlexRAN's FEC SDK to /opt.
      • Compile FlexRAN as follows:
       $ sudo chmod -R a+rwX FlexRAN-FEC-SDK-19-04/ # Allow all users read-write access 
       $ cd /opt/FlexRAN-FEC-SDK-19-04/sdk/ 
       $ sed -i '/add_compile_options("-Wall")/a \ \ add_compile_options("-ffreestanding")' cmake/intel-compile-options.cmake 
       $ ./create-makefiles-linux.sh 
       $ cd build-avx512-icc # or build-avx2-icc 
       $ make -j
       
    • Optional: DPDK

      • Refer to DPDK_README.md for configuration and installation instructions.

Building and running with emulated RRU

We provide a high performance packet generator to emulate the RRU. This generator allows Agora to run and be tested without actual RRU hardware.
The following are steps to set up both Agora and the packet generator:

  • Build Agora. This step also builds the emulated RRU, a data generator that generates random input data files, an end-to-end test that checks correctness of end results for both uplink and downlink,
    and several unit tests for testing either performance or correctness of individual functions.

     $ cd Agora
     $ mkdir build
     $ cd build
     $ cmake ..
     $ make -j
     
  • Run end-to-end test to check correctness (uplink, downlink and combined tests should all pass if everything is set up correctly).

     $ ./test/test_agora/test_agora.sh 10 out # Runs test for 10 iterations
     

Run Agora with emulated RRU traffic

  • NOTE: We recommend running Agora and the emulated RRU on two different machines.
    If you are running them on the same machine, make sure Agora and the emulated RRU are using different set of cores, otherwise there will be performance slow down.

When running Agora and the emulated RRU on two different machines, the following steps use Linux networking stack for packet I/O.
Agora also supports using DPDK to bypass the kernel for packet I/O. See DPDK_README.md for instructions of running emulated RRU and Agora with DPDK.

  • First, return to the base directory (cd ..), then run
   $ ./build/data_generator --conf_file files/config/ci/tddconfig-sim-ul.json
   
 to generate data files.
  • In one terminal, run
   $ ./build/agora --conf_file files/config/ci/tddconfig-sim-ul.json
   
to start Agora with uplink configuration.
  • In another terminal, run
   $ ./build/sender --num_threads=2 --core_offset=1 --frame_duration=5000 --enable_slow_start=1 --conf_file=files/config/ci/tddconfig-sim-ul.json
   

to start the emulated RRU with uplink configuration.

Run Agora with channel simulator and clients

  • First, return to the base directory (cd ..), then run
   $ ./build/data_generator --conf_file files/config/ci/chsim.json
   
to generate data files.
  • In one terminal, run
   $ ./build/user --conf_file files/config/ci/chsim.json
   
 to start clients with
 combined uplink & downlink configuration.
  • In another terminal, run
   $ ./build/chsim --bs_threads 1 --ue_threads 1 --worker_threads 2 --core_offset 24 --conf_file files/config/ci/chsim.json
   
  • In another terminal, run
   $ ./build/agora --conf_file files/config/ci/chsim.json
   

to start Agora with the combined configuration.

  • Note: make sure Agora and sender are using different set of cores, otherwise there will be performance slow down.

Run Agora with channel simulator, clients, and mac enabled

  • Compile the code with
   $ cmake .. -DENABLE_MAC=true
   
  • Uplink Testing (--conf_file mac-ul-sim.json)
  • Downlink Testing (--conf_file mac-dl-sim.json)
  • Combined Testing (--conf_file mac-sim.json)
    • Terminal 1:
      $./build/data_generator --conf_file files/config/examples/mac-sim.json
    
    to generate data files.
      $./build/user --conf_file files/config/examples/mac-sim.json
    
    to start users.
    • Terminal 2:
    $ ./build/chsim --bs_threads 1 --ue_threads 1 --worker_threads 2 --core_offset 28 --conf_file files/config/examples/mac-sim.json
    
    to run the channel simulator
    • Terminal 3:
      $ ./build/macuser --enable_slow_start 1 --conf_file files/config/examples/mac-sim.json
    
    to run to user mac app. Specify --data_file "" to generate patterned data and --conf_file options as necessary.
    • Terminal 4:
    $ ./build/agora --conf_file files/config/examples/mac-sim.json
    
    run agora before running macbs. Run macuser -> agora -> macbs in quick succession.
    • Terminal 5:
    $ ./build/macbs --enable_slow_start 1 --conf_file files/config/examples/mac-sim.json
    
    to run to base station mac app. specify --data_file "" to generate patterned data and --conf_file options as necessary.
  • Note: make sure agora / user / chsim / macuser / macbs are using different set of cores, otherwise there will be performance slow down.

Building and running with real RRU

Agora supports a 64-antenna Faros base station as RRU and Iris UE devices. Both are commercially available from Skylark Wireless and are used in the POWER-RENEW PAWR testbed.
Both Faros and Iris have their roots in the Argos massive MIMO base station, especially ArgosV3. Agora also supports USRP-based RRU and UEs.

We recommend using one server for controlling the RRU and running Agora, and another server for controlling the UEs and running the UE code.

Agora supports both uplink and downlink with real RRU and UEs. For downlink, a reference node outside the array (and synchronized) is required for reciprocity calibration.
Note: Faros RRU and Iris UEs can be discovered using the pyfaros tool. You can use this tool to find the topology of the hardware connected to the server.

We describe how to get the uplink and downlink demos working. Below XX can be replaced with either ul and dl.

  • Rebuild the code on both servers for RRU side the UE side.
    • For Faros RRU and Iris UEs, pass -DRADIO_TYPE=SOAPY_IRIS to cmake
    • For USRP-based RRU and UEs, pass -DRADIO_TYPE=SOAPY_UHD to cmake
    • Run make -j to recompile the code.
  • Run the UE code on the server connected to the Iris UEs
    • For Iris UEs, run the pyfaros tool in the files/topology directory as follows:
      $ python3 -m pyfaros.discover --json-out
      
      This will output a file named topology.json with all the discoverable serial IDs included.
    • Modify files/topology/topology.json by adding/removing serials of client Irises you'd like to include from your setup.
    • For USRP-based RRU and UEs, modify the existing files/topology/topology.json and enter the appropriate IDs.
    • Run ./build/data_generator --conf_file files/config/XX-hw.json to generate required data files.
    • Run ./build/user --conf_file files/config/XX-hw.json.
  • Run Agora on the server connected to the Faros RRU
    • scp over the generated file files/experiment/LDPC_orig_XX_data_512_ant2.bin from the client machine to the server's files/experiment directory.
    • Rebuild the code
      • Run make -j to compile the code.
    • For Faros RRU, use the pyfaros tool the same as with the UEs to generate a new files/topology/topology.json
    • Modify files/topology/topology.json by adding/removing serials of your RRU Irises, and the hub.
    • Run ./build/agora --conf_file files/config/XX-hw.json.

Running performance test

To test the real-time performance of Agora for processing 64x16 MU-MIMO with 20 MHz bandwidth and 64QAM modulation, we recommend using two servers (one for Agora and another for the emulated RRU) and DPDK
for networking. In our experiments, we use 2 servers each with 4 Intel Xeon Gold 6130 CPUs. The servers are connected by 40 GbE Intel XL710 dual-port NICs.

  • NOTE: We recommend using at least 10 GbE NIC and a server with more than 10 cores for testing real-time performance of 8x8 MU-MIMO. For 8x8 MU-MIMO, our test on a machine with AVX-512 and CPU frequency
    of 2.3 GHz support shows that at least 7 worker cores are required to achieve real-time performance. Additionally, Agora requires one core for the manager thread and at least 1 core for network threads.\

We change "worker_thread_num" and "socket_thread_num" to change the number cores assigned to of worker threads and network threads in the json files, e.g., files/config/ci/tddconfig-sim-ul.json.
If you do not have a powerful server or high throughput NICs, we recommend increasing the value of --frame_duration when you run ./build/sender, which will increase frame duration and reduce throughput.

To process 64x16 MU-MIMO in real-time, we use both ports of 40 GbE Intel XL710 NIC with DPDK (see DPDK_README.md) to get enough throughput for the traffic of 64 antennas.
(NOTE: For 100 GbE NIC, we just need to use one port to get enough thoughput.)

To reduce performance variations, we did the following configurations for the server that runs Agora:

  • NOTE: These steps are not strictly required if you just wanted to try out Agora and do not care about performance variations.
  • Disable Turbo Boost to reduce performance variation by running
    $ echo "0" | sudo tee /sys/devices/system/cpu/cpufreq/boost
    
  • Set CPU scaling to performance by running
    $ sudo cpupower frequency-set -g performance
    
    where cpupower can be installed through
    $ sudo apt-get install -y linux-tools-$(uname -r)
    
  • Turn off hyper-threading. We provide an example bash script (scripts/tune_hyperthread.sh), where the core indices are machine dependent.
  • Set IRQ affinity to direct OS interrupts away from Agora's cores. We direct all the interrupts to core 0 in our experiments.
    We provide an example bash script (scripts/set_smp_affinity.sh), where the IRQ indices are machine dependent.

The steps to collect and analyze timestamp traces are as follows:

  • Enable DPDK in Agora. Make sure it is compiled / configured for supporting your specific hardware NICs (see DPDK_README.md).

  • We use files/config/ci/tddconfig-sim-ul.json for uplink experiments and files/config/ci/tddconfig-sim-dl.json for downlink experiments.
    In our paper, we change “antenna_num”, “ue_num” and “symbol_num_perframe” to different values to collect different data points in the figures.

  • Generate source data files by running

    $ ./build/data_generator --conf_file files/config/ci/tddconfig-sim-ul.json
    
  • Run Agora as a real-time process (to prevent OS from doing context switches) using

    $ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH} chrt -rr 99 ./build/agora --conf_file files/config/ci/tddconfig-sim-ul.json
    

    (NOTE: Using a process priority 99 is dangerous. Before running it, make sure you have directed OS interrupts away from cores used by Agora. If you have not done so, run

    $ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH} ./build/agora --conf_file files/config/ci/tddconfig-sim-ul.json
    

    instead to run Agora as a normal process.)

  • Run the emulated RRU using

    $ sudo LD_LIBRARY_PATH=${LD_LIBRARY_PATH} ./build/sender --num_threads=2 --core_offset=0 \
      --conf_file=files/config/ci/tddconfig-sim-ul.json --frame_duration=5000 --enable_slow_start=1
    

    For DPDK, add --server_mac_addr= and set it to the MAC address of the NIC used by Agora.

  • The timestamps will be saved in files/experiment/timeresult.txt after Agora finishes processing. We can then use a MATLAB script to process the timestamp trace.

  • We also provide MATLAB scripts for uplink and downlink that are able to process multiple timestamp files and generate figures reported in our paper.

Log and plot PHY stats:

  • Compile the code with
    $ cmake .. -DENABLE_CSV_LOG=True
    
  • Run test with desired config; log files will be created in a directory named with timestamp under the files/log/ folder
  • Run plot_csv.py with csv file input
    $ python3 tools/python/plot_csv.py [max_frames] [X_label] [Y_label] [legend_name] < path/to/log/log-xyz.csv
    With optional paramters, e.g.,
    $ python3 tools/python/plot_csv.py 1000 Frame EVM UE < files/log/2022-10-25-15-46-55/log-evm-BS.csv
    or set max_frames to 0 to plot all frames, e.g.,
    $ python3 tools/python/plot_csv.py 0 Frame EVM UE < files/log/2022-10-25-15-46-55/log-evm-BS.csv
    or without any paramter to plot as default, e.g.,
    $ python3 tools/python/plot_csv.py < files/log/2022-10-25-15-46-55/log-evm-BS.csv
    Note the < operator is required.
    
  • (Optional) Run plot_csv.py with UDP input Set log listener IP address and port in config file, e.g.,
    "log_listener_addr": "127.0.0.1",
    "log_listener_port": 33300
    
    Before start, run the command on the listener machine (which has the specified IP address):
    $ nc -u -l [port_number] | python3 tools/python/plot_csv.py [max_frames] [X_label] [Y_label] [legend_name]
    port_number (required) is log_listener_port + log_id (defined in csv_logger.h);
    max_frames (required) is a positive integer no greater than the maximum transfered frames.
    For example,
    $ nc -u -l 33303 | python3 tools/python/plot_csv.py 1000 Frame EVM UE
    Repeat with multiple ports for more logs if desired.
    
    Run test; plots will be shown when max_frames is reached.

Contributing to Agora

Agora is open-source and open to your contributions. Before contributing, please read this.

Acknowledgment

Agora was funded in part by NSF Grant #1518916 and by the NSF PAWR project.

Documentation

Check out Agora Wiki for Agora's design overview and flow diagram that maps massive MIMO baseband processing to the actual code structure. Technical details and performance results can be found in

  • Jian Ding, Rahman Doost-Mohammady, Anuj Kalia, and Lin Zhong, "Agora: Real-time massive MIMO baseband processing in software," in Proc. of ACM CoNEXT, December 2020 (PDF, video).

Doxygen documentation generation for Agora can be initiated by running the following command from the repository root directory: doxygen Agora_doxygen.conf The latest hosted output is located at Agora Doxygen

Other community resources can be found at the RENEW Wireless Wiki

Contact

Jian Ding (jian.ding@yale.edu)

About

Nexus: Efficient and Scalable Multi-Cell mmWave Baseband Processing with Heterogeneous Compute

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors