This is the official Pytorch/PytorchLightning implementation of the paper:
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
We propose a simple yet fast and effective partial convolution (PConv), as well as a latency-efficient family of architectures called FasterNet.
Create an new conda virtual environment
conda create -n fasternet python=3.9.12 -y
conda activate fasternet
Clone this repo and install required packages:
git clone https://github.com/JierunChen/FasterNet
cd FasterNet/
pip install -r requirements.txt
Download the ImageNet-1K classification dataset and structure the data as follows:
/path/to/imagenet-1k/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img3.jpeg
class2/
img4.jpeg
name resolution acc #params FLOPs model FasterNet-T0 224x224 71.9 3.9M 0.34G model FasterNet-T1 224x224 76.2 7.6M 0.85G model FasterNet-T2 224x224 78.9 15.0M 1.90G model FasterNet-S 224x224 81.3 31.1M 4.55G model FasterNet-M 224x224 83.0 53.5M 8.72G model FasterNet-L 224x224 83.5 93.4M 15.49G model
We give an example evaluation command for a ImageNet-1K pre-trained FasterNet-T0 on a single GPU:
python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 125
-c
, --checkpoint_path
accordingly. You can get the pre-trained models from the tables above.-g
to a larger number or a list, e.g., 8
or 0,1,2,3,4,5,6,7
. Note that the batch size for evaluation should be changed accordingly, e.g., change -e
from 125
to 1000
.To measure the latency on CPU/ARM and throughput on GPU (if any), run
python train_test.py -c cfg/fasternet_t0.yaml \
--checkpoint_path model_ckpt/fasternet_t0-epoch=281-val_acc1=71.9180.pth \
--data_dir ../../data/imagenet --test_phase -g 1 -e 32 --measure_latency --fuse_conv_bn
-e
controls the batch size of input on GPU while the batch size of input is fixed internally to 1 on CPU/ARM.FasterNet-T0 training on ImageNet-1K with a 8-GPU node:
python train_test.py -g 0,1,2,3,4,5,6,7 --num_nodes 1 -n 4 -b 4096 -e 2000 \
--data_dir ../../data/imagenet --pin_memory --wandb_project_name fasternet \
--model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_t0.yaml
To train other FasterNet variants, --cfg
need to be changed. You may also want to change the training batch size -b
.
This repository is built using the timm , poolformer, ConvNeXt and mmdetection repositories.
If you find this repository helpful, please consider citing:
@article{chen2023run,
title={Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks},
author={Chen, Jierun and Kao, Shiu-hong and He, Hao and Zhuo, Weipeng and Wen, Song and Lee, Chul-Ho and Chan, S-H Gary},
journal={arXiv preprint arXiv:2303.03667},
year={2023}
}
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4