This repository contains the code for the paper Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices.
To install the necessary python dependencies:
# cd into the same directory as this README file.
# Install virtualenv, use equivalent command for non debian based systems.
sudo apt-get install python3-venv
python3 -m venv py3
# or virtualenv --python=/usr/bin/python3.6 py3
source py3/bin/activate
pip install --upgrade pip
pip install tensorflow
pip install pandas
pip install -e .The pretrained weights for the experiments shown in the paper are available for download via the following command:
wget https://polybox.ethz.ch/index.php/s/ESx1cuJEboxikRB/download -O data.zip
unzip data.zip
rm data.zipThis creates a directory data/ with the trained weights in data/weights/ and
the retrained weights (for real world dataset) in data/weights_retrain/.
If these weights should be used for evaluation, replace the weights_dir_path in the evaluation section
by data/weights/ or data/weights_retrain/ respectively.
There are also (old) pretrained weights available, which were trained only via the unsupervised loss function. Download them via the following command:
wget https://deeprpca.s3.eu-central-1.amazonaws.com/data.zip
unzip data.zipNOTE: The following is not needed if Denise is trained/evaluated first (since it automatically generates datasets which do not exist).
The following command creates the Market dataset and prepares it for training
python3 denise/script_prepare_datasets.py --market=trueIf you do intend to run the training, you will also need the synthetic dataset, which is about 31GB for matrices of size 20x20.
python3 denise/script_prepare_datasets.py --synthetic=trueTo only generate the synthetic evaluation data:
python3 denise/script_prepare_datasets.py --market=false --synthetic_validation=trueThis step is optional, you can pass and directly use the downloaded weights.
The training will save the trained weights in weights_dir_path, which is
data/weights2/ by default. To use these weights for evaluation, replace the
weights_dir_path in the evaluation section by
data/weights2/.
Training locally:
python denise/script_train_eval.py train \
--batch_size=512 \
--N=20 \
--K=3 \
--nb_epochs=90 \
--forced_rank=3 \
--sparsity=95 \
--ldistribution=normal \
--model=topo0 \
--master=local \
--weights_dir_path=data/weights2 \
--loss={L1, L1_S_diff} \
--shrink=False \
--nb_CPU_inter=0 --nb_CPU_intra=0The following command retrains Denise on Market data using the weights from
trained_weights_dir_path as starting point and stores the retrained weights in
weights_dir_path.
```sh
python denise/retrain_market.py --weights_dir_path=data/weights2_retrain --trained_weights_dir_path=data/weights2 --N=20 --K=3 --forced_rank=3 --sparsity=95 --model=topo0 --shrink=False --epochs=100 --batch_size=58Create your own TPU cloud instance by following the official documentation.
For quick reference, at the time of writing this, run the following command in cloud shell:
ctpu up --name host-v2-8 --zone europe-west4-a --tpu-size v2-8
SSH to the VM instance created for your TPU above in step one above.
Install dependencies as described earlier, using Google storage to store prepare data.
Run training as describe at previous step, using --master flag to specify TPU to be used.
On synthetic dataset:
RESULTS_DIR="$(pwd)/results"
mkdir ${RESULTS_DIR}python denise/script_train_eval.py eval \
--model=topo0 \
--weights_dir_path=data/weights/ \
--N=20 \
--K=3 \
--sparsity={60,70,80,90,95} \
--results_dir=${RESULTS_DIR} \
--forced_rank=3 \
--shrink=False \
--ldistribution={normal, uniform, normal0, normal1, normal2, normal3, normal4, uniform1, uniform2, student1, student2}or run all the evaluations on different synthetic datasets (i.e. for all combinations of sparsity and ldistribution) also for baselines:
./run_evals.shOn real dataset:
python denise/script_train_eval.py eval \
--eval_market \
--model=topo0 \
--weights_dir_path=data/weights/ \
--N=20 \
--K=3 \
--sparsity=95 \
--results_dir=${RESULTS_DIR} \
--forced_rank=3 \
--shrink=False
You'll need Matlab and the Python API. See https://ch.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html for instructions. We used Matlab version R2018b which is compatible with python 3.6.
Install the needed Matlab library: https://github.com/andrewssobral/lrslibrary On OSX some files of the library do not work and have to be changed.
as described above, all baselines can be evaluated (together with Denise) by running:
./run_evals.shotherwise, run the following command with the correct path to the installed lrs matlab library:
export LIB_MATLAB_LRS_PATH="/Users/Flo/Code/Matlab/lrslibrary"
# export LIB_MATLAB_LRS_PATH="/userdata/fkrach/Projects/matlab/lrslibrary"On synthetic dataset:
python3 denise/script_eval_baselines.py \
--results_dir=${RESULTS_DIR} \
--forced_rank=3 \
--K=3 \
--N=20 \
--sparsity=95 \
--ldistribution=normal \
--shrink=False Using the synthetic eval data generated with other distributions:
python3 denise/script_eval_baselines.py \
--results_dir=${RESULTS_DIR} \
--forced_rank=3 \
--K=3 \
--N=20 \
--sparsity={60,70,80,90,95} \
--shrink=False \
--ldistribution={normal, uniform, normal1, normal2, normal3, normal4, uniform1, uniform2, student1, student2}On real world (stock market) dataset:
python3 denise/script_eval_baselines.py \
--eval_market \
--results_dir=${RESULTS_DIR} \
--forced_rank=3 \
--shrink=False \
--N=20COMPARISONS_DIR="$(pwd)/comparisons"
mkdir ${COMPARISONS_DIR}On synthetic dataset results:
python3 denise/script_draw_comparison_images.py \
--N=20 \
--K=3 \
--forced_rank=3 \
--sparsity={60,70,80,90,95} \
--results_dir=${RESULTS_DIR} \
--comparisons_dir=${COMPARISONS_DIR}
--ldistribution={normal, uniform, normal1, normal2, normal3, normal4, uniform1, uniform2, student1, student2}On market dataset results:
python3 denise/script_draw_comparison_images.py \
--eval_market \
--N=20 \
--forced_rank=3 \
--results_dir=${RESULTS_DIR} \
--shrink=False \
--comparisons_dir=${COMPARISONS_DIR}cd into code_src (and adjust the following two files as needed), then:
python denise/get_csv.py; python denise/create_tex_table.pycd into code_src:
python denise/run_realestate.py --model=topo0 --weights_dir_path=data/weights/ --shrink=FalseThe list of algos is read from script_draw_comparison_images.py.
The evaluation must have been run beforehand, since matrices are read
from results folder.
python3 denise/script_m_minus_l_minus_s.py --results_dir=$RESULTS_DIRThis code can be used in accordance with the LICENSE.
If you find this code useful or include parts of it in your own work, please cite our paper:
Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices
@article{
herrera2023denise,
title={Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices},
author={Calypso Herrera and Florian Krach and Anastasis Kratsios and Pierre Ruyssen and Josef Teichmann},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=D45gGvUZp2},
note={}
}