A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/snakers4/playing_with_vae below:

GitHub - snakers4/playing_with_vae: Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST

This is a test task I did for some reason. It contains evaluation of:

What you can find here:

To build the docker image from the Dockerfile located in dockerfile please do:

cd dockerfile
docker build -t vae_docker .

(you can replace public ssh key with yours, ofc)

Also please make sure that nvidia-docker2 and proper nvidia drivers are installed.

To test the installation run

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Then launch the container as follows:

docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 -it -v /your/folder/:/home/keras/notebook/your_folder -p 8888:8888 -p 6006:6006 --name vae --shm-size 16G vae_docker

Please note that w/o --shm-size 16G PyTorch dataloader classes will not work. The above command will start a container with a Jupyter notebook server available via port 8888. Port 6006 is for tensorboard, if necessary.

Then you can exec into the container like this. All the scripts were run as root, but they must also work under user keras

docker exec -it --user root REPLACE_WITH_CONTAINER_ID /bin/bash

or

docker exec -it --user keras REPLACE_WITH_CONTAINER_ID /bin/bash

To find out the container ID run

Most important dependencies (if you do not want docker)

These are the most important dependencies (others you can just install in the progress):

Ubuntu 16.04
cuda 9.0
cudnn 7
python 3.6
pip
PIL
tensorflow-gpu (for tensorboard)
pandas
numpy
matplotlib
seaborn
tqdm
scikit-learn
pytorch 0.4.0 (cuda90)
torchvision 2.0
datashader
umap

If you have trouble with these, look up how I install them in the Dockerfile / jupyter notebook.

The best model can be trained as follows

python3 train.py \
	--epochs 30 --batch-size 512 --seed 42 \
	--model_type fc_conv --dataset_type fmnist --latent_space_size 10 \
	--do_augs False \
	--lr 1e-3 --m1 40 --m2 50 \
	--optimizer adam \
	--do_running_mean False --img_loss_weight 1.0 --kl_loss_weight 1.0 \
	--image_loss_type bce --ssim_window_size 5 \
	--print-freq 10 \
	--lognumber fmnist_fc_conv_l10_rebalance_no_norm \
	--tensorboard True --tensorboard_images True \

If you launch this code, the copy of FMNIST dataset will be dowloaded automatically.

Suggested alternative values for the flags for playing with them:

These flags are optional --tensorboard True --tensorboard_images True, in order to use them, you have to

You can also resume from the best checkpoint using these flags:

python3 train.py \
	--resume weights/fmnist_fc_conv_l10_rebalance_no_norm_best.pth.tar \
	--epochs 60 --batch-size 512 --seed 42 \
	--model_type fc_conv --dataset_type fmnist --latent_space_size 10 \
	--do_augs False \
	--lr 1e-3 --m1 50 --m2 100 \
	--optimizer adam \
	--do_running_mean False --img_loss_weight 1.0 --kl_loss_weight 1.0 \
	--image_loss_type bce --ssim_window_size 5 \
	--print-freq 10 \
	--lognumber fmnist_resume \
	--tensorboard True --tensorboard_images True \

The best reconstructions are supposed to look like this (top row - original images, bottow row - reconstructions):

Brief ablation analysis of the results

✓ What worked

  1. Using BCE loss + KLD loss
  2. Converting a plain FC model into a conv model in the most straight-forward fashion possible, i.e. replacing this
        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, latent_space_size)
        self.fc22 = nn.Linear(400, latent_space_size)
        self.fc3 = nn.Linear(latent_space_size, 400)
        self.fc4 = nn.Linear(400, 784)

with this

        self.fc1 = nn.Conv2d(1,32, kernel_size=(28,28), stride=1, padding=0)
        self.fc21 = nn.Conv2d(32,latent_space_size, kernel_size=(1,1), stride=1, padding=0)
        self.fc22 = nn.Conv2d(32,latent_space_size, kernel_size=(1,1), stride=1, padding=0)
        
        self.fc3 = nn.ConvTranspose2d(latent_space_size,118, kernel_size=(1,1),  stride=1, padding=0)
        self.fc4 = nn.ConvTranspose2d(118,1, kernel_size=(28,28),  stride=1, padding=0)
  1. Using SSIM as visualization metric. It correlates awesomely with perceived visual similarity of the image and its reconstruction

✗ What did not work

  1. Extracting mean and std from images - removing this feature boosted SSIM on FMNIST 4-5x
  2. Doing any simple augmentations (unsurprisingly - it adds a complexity level to a simple task)
  3. Any architectures beyond the most obvious ones:
  4. MSE loss performed poorly, SSIM loss did not work at all
  5. LR decay, as well as any LR besides 1e-3 (with adam) does not really help
  6. Increasing latent space to 20 or 100 does not really change much

** ¯|(ツ)/¯ What I did not try**

  1. Ensembling or building meta-architectures
  2. Conditional VAEs
  3. Increasing network capacity

Please refer to section 5 of the main.ipynb

Is notable that:

Jupyter notebook (.ipynb file) is best viewed using these Jupiter notebook extensions (installed with the below command, then to be turned on in the Jupyter GUI)

pip install git+https://github.com/ipython-contrib/jupyter_contrib_nbextensions
# conda install html5lib==0.9999999
jupyter contrib nbextension install --system

Sometims there is a html5lib conflict. Excluded from the Dockerfile because of this conflict (sometimes occurs, sometimes not).


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4