ComprEx is a GaspiCxx based application library, which exchanges large vectors with lossy compression and local error accumulation.
cmakeversion > 3.6c++ 14(presently withgcc-5.2.0)GPIas a prerequisite for GaspiCxx, available hereGaspiCxxavailable here- (optional) Tensorflow version 2.3, it is strongly recommended to use a conda environment
- Install
GPIbefore attempting to installGaspiCxx. - Make sure
GaspiCxxis build with theBUILD_SHARED_LIBSoptionon. Please modify the entry in the mainCMakeLists.txtofGaspiCxx! Comprex will search for theGaspiCxxlibrary in thebuild/src/directory ofGaspiCxx. - If
Tensorflowshould be used, it is recommended to use acondaenvironment.
-
Clone the git repository.
git clone https://github.com/epeec/Comprex.git -
The main
CmakeLists.txtneeds to be updated with the installation path ofGaspiCxx.set (GASPI_CXX_ROOT "<GaspiCxx directory>")GPI will be detected automatically. If Tensorflow operations should be installed as well, then set the option for
BUILD_TFOPStoON. If Infiniband is used, switch theLINK_IBtoON. -
Build and install the Comprex library plus examples. Make sure the correct conda environment is activated, if Tensorflow operations should be installed. In the main directory of Comprex execute
mkdir build cd build cmake .. make installIf successful,
- the library
libPyGPI.sois installed inlib/ - optionally, the library
libtfGPIOps.sois installed inlib/, ifBUILD_TFOPSwasON.
Additionally, a dummy
nodefileis created in thebuilddirectory for the examples and tests. Also, a series of shell files for various examples and tests are created inbuild.The include files can be found in the Comprex main directory
include/folder. - the library
The Examples are set up to run without further configuration after the installation. You can run the following commands from the build dirtectory to try out the examples:
make run_example_gaspiEx_c
make run_example_gaspiEx_py
Should finish with a +++ PASSED +++ message.
make run_example_comprEx_c
make run_example_comprEx_py
Should finish with a +++ PASSED +++ message.
make run_example_allreduce_speed_c
Should perform a few timing measurements and print the results on screen.
make run_example_comprex_speed_c
Should perform a few timing measurements and print the results on screen.
In the LeNet directory, there is a script prepared to run the Training of the Lenet model on the MNIST dataset. The dataset will be acquired automatically. The nodelist is configured to run two ranks on the local machine.
Make sure the correct conda environment is activated and run
source run.sh
Lenet will be trained with several setups. The results will be written in log/.
The results can be inspected by going into the log/ directory and starting Tensorboard with
tensorboard --logdir=.
In the Chrome webbrowser, visit http://localhost:6006/ to see the training results.