A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ZhengPeng7/BiRefNet below:

ZhengPeng7/BiRefNet: [CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

1 Nankai University  2 Northwestern Polytechnical University  3 National University of Defense Technology 
4 Aalto University  5 Shanghai AI Laboratory  6 University of Trento 

This repo is the official implementation of "Bilateral Reference for High-Resolution Dichotomous Image Segmentation" (CAAI AIR 2024).

Note

We need more GPU resources (2024-04-08) to push forward the performance of BiRefNet, especially on pushing BiRefNet to general use and higher-resolution images. If you are happy to cooperate, please contact me at zhengpeng0108@gmail.com.

🚀 Load BiRefNet in ONE LINE by HuggingFace, check more:
from transformers import AutoModelForImageSegmentation
birefnet = AutoModelForImageSegmentation.from_pretrained('zhengpeng7/BiRefNet', trust_remote_code=True)

You can access the inference API service of BiRefNet on FAL or click the Deploy button on our HF model page to set up your own deployment.

Our BiRefNet has achieved SOTA on many similar HR tasks:

DIS:

Figure of Comparison on DIS Papers with Codes (by the time of this work):

COD:

Figure of Comparison on COD Papers with Codes (by the time of this work):

HRSOD:

Figure of Comparison on HRSOD Papers with Codes (by the time of this work):
Try our online demos for inference:

For more general use of our BiRefNet, I extended the original academic one to more general ones for better real-life application.

Datasets and datasets are suggested to be downloaded from official pages. But you can also download the packaged ones: DIS, HRSOD, COD, Backbones.

Find performances (almost all metrics) of all models in the exp-TASK_SETTINGS folders in [stuff].

Models in the original paper, for comparison on benchmarks: Models trained with customed data (general, matting), for general use in practical application: Task Training Sets Backbone Test Set Metric (S, wF[, HCE]) Download general use (2048x2048) AIM-500, DIS-TR, DIS-TEs, HIM2K, PPM-100, TE-HRS10K, TE-Human-2k, TE-P3M-500-P, TR-AM-2k, TR-HRSOD, TR-UHRSD, Distinctions-646_BG-20k, Human-2k_BG-20k, TE-AM-2k, TE-HRSOD, TE-P3M-500-NP, TE-UHRSD, TR-HRS10K, TR-P3M-10k, TR-humans swin_v1_large DIS-VD 0.927, 0.894, 881 google-drive general use DIS5K-TR, DIS-TEs, DUTS-TR_TE, HRSOD-TR_TE, UHRSD-TR_TE, HRS10K-TR_TE, TR-P3M-10k, TE-P3M-500-NP, TE-P3M-500-P, TR-humans swin_v1_large DIS-VD 0.911, 0.875, 1069 google-drive general use DIS5K-TR, DIS-TEs, DUTS-TR_TE, HRSOD-TR_TE, UHRSD-TR_TE, HRS10K-TR_TE, TR-P3M-10k, TE-P3M-500-NP, TE-P3M-500-P, TR-humans swin_v1_tiny DIS-VD 0.882, 0.830, 1175 google-drive general use DIS5K-TR, DIS-TEs swin_v1_large DIS-VD 0.907, 0.865, 1059 google-drive general matting P3M-10k (except TE-P3M-500-NP), TR-humans, AM-2k, AIM-500, Human-2k (synthesized with BG-20k), Distinctions-646 (synthesized with BG-20k), HIM2K, PPM-100 swin_v1_large TE-P3M-500-NP 0.979, 0.988 google-drive portrait matting P3M-10k, humans swin_v1_large P3M-500-P 0.983, 0.989 google-drive Segmentation with box guidance: Model efficiency:

Screenshot from the original paper. All tests here are conducted on a single A100 GPU.

The devices used in the below table differ from those in the original paper (the standard). So, it's only for reference.

Runtime FP32 FP16 A100 86.8ms 69.4ms 4090 95.8ms 57.7ms V100 384ms 152ms GPU Memory FP32 FP16 Inference 4.76GB 3.45GB Training (#GPU=1, batch_size=2, compile=False+PyTorch=2.5.1) 36.3GB 30.4GB Training (#GPU=1, batch_size=2, compile=True+PyTorch=2.5.1) 35.9GB 22.5GB (4090), 23.5GB (A100) ONNX conversion:

We converted from .pth weights files to .onnx files.
We referred a lot to the Kazuhito00/BiRefNet-ONNX-Sample, many thanks to @Kazuhito00.

We found there've been some 3rd party applications based on our BiRefNet. Many thanks for their contribution to the community!
Choose the one you like to try with clicks instead of codes:

  1. Applications:

Methods Pytorch ONNX TensorRT        First Inference Time       0.71s 5.32s 0.17s Methods Pytorch ONNX TensorRT Avg Inf Time (excluding 1st) 0.15s 4.43s 0.11s
  1. More Visual Comparisons

    video-from_twitter_toyxyz3_2.mp4 video-from_twitter_toyxyz3_1.mp4
# PyTorch==2.5.1+CUDA12.4 (or 2.0.1+CUDA11.8) is used for faster training (~40%) with compilation.
conda create -n birefnet python=3.10 -y && conda activate birefnet
pip install -r requirements.txt

Download combined training / test sets I have organized well from: DIS--COD--HRSOD or the single official ones in the single_ones folder, or their official pages. You can also find the same ones on my BaiduDisk: DIS--COD--HRSOD.

Download backbone weights from my google-drive folder or their official pages.

# Train & Test & Evaluation
./train_test.sh RUN_NAME GPU_NUMBERS_FOR_TRAINING GPU_NUMBERS_FOR_TEST
# Example: ./train_test.sh tmp-proj 0,1,2,3,4,5,6,7 0

# See train.sh / test.sh for only training / test-evaluation.
# After the evaluation, run `gen_best_ep.py` to select the best ckpt from a specific metric (you choose it from Sm, wFm, HCE (DIS only)).
🖊️ Fine-tuning on Custom Data

A video of the tutorial on BiRefNet fine-tuning has been released on my YouTube channel ⬇️

Suppose you have some custom data, fine-tuning on it tends to bring improvement.

  1. Pre-requisites: you have put your datasets in the path ${data_root_dir}/TASK_NAME/DATASET_NAME. For example, ${data_root_dir}/DIS5K/DIS-TR and ${data_root_dir}/General/TR-HRSOD, where im and gt are both in each dataset folder.
  2. Change an existing task to your custom one: replace all 'General' (with single quotes) in the whole project with your custom task name as the screenshot of vscode given below shows:
  3. Adapt settings:
  4. Use existing weights: if you want to use some existing weights to fine-tune that model, please refer to the resume argument in train.py. Attention: the epoch of training continues from the epochs the weights file name indicates (e.g., 244 in BiRefNet-general-epoch_244.pth), instead of 1. So, if you want to fine-tune 50 more epochs, please specify the epochs as 294. \#Epochs, \#last epochs for validation, and validation step are set in train.sh.
  5. Good luck to your training :) If you still have questions, feel free to leave issues (recommended way) or contact me.

Download the BiRefNet-{TASK}-{EPOCH}.pth from [stuff] and the release page of this repo. Info of the corresponding (predicted_maps/performance/training_log) weights can be also found in folders like exp-BiRefNet-{TASK_SETTINGS} in the same directory.

You can also download the weights from the release of this repo.

The results might be a bit different from those in the original paper, you can see them in the eval_results-BiRefNet-{TASK_SETTINGS} folder in each exp-xx, we will update them in the following days. Due to the very high cost I used (A100-80G x 8), which many people cannot afford (including myself....), I re-trained BiRefNet on a single A100-40G only and achieved the performance on the same level (even better). It means you can directly train the model on a single GPU with 36.5G+ memory. BTW, 5.5G GPU memory is needed for inference in 1024x1024. (I personally paid a lot for renting an A100-40G to re-train BiRefNet on the three tasks... T_T. Hope it can help you.)

But if you have more and more powerful GPUs, you can set GPU IDs and increase the batch size in config.py to accelerate the training. We have made all these kinds of things adaptive in scripts to seamlessly switch between single-card training and multi-card training. Enjoy it :)

This project was originally built for DIS only. But after the updates one by one, I made it larger and larger with many functions embedded together. Finally, you can use it for any binary image segmentation tasks, such as DIS/COD/SOD, medical image segmentation, anomaly segmentation, etc. You can eaily open/close below things (usually in config.py):

Many of my thanks to the companies / institutes below.

@article{zheng2024birefnet,
  title={Bilateral Reference for High-Resolution Dichotomous Image Segmentation},
  author={Zheng, Peng and Gao, Dehong and Fan, Deng-Ping and Liu, Li and Laaksonen, Jorma and Ouyang, Wanli and Sebe, Nicu},
  journal={CAAI Artificial Intelligence Research},
  volume = {3},
  pages = {9150038},
  year={2024}
}

Any questions, discussions, or even complaints, feel free to leave issues here (recommended) or send me e-mails (zhengpeng0108@gmail.com) or book a meeting with me: calendly.com/zhengpeng0108/30min. You can also join the Discord Group (https://discord.gg/d9NN5sgFrq) if you want to talk a lot publicly.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4