AIDC-AI/Ovis-U1-3B
๐จ
evalstate/Ovis-U1-3B
๐จ
innoai/Ovis-U1-3B-cpu
Building on the foundation of the Ovis series, Ovis-U1 is a 3-billion-parameter unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
The overall architecture of Ovis-U1 (cf. Fig.2 in our report).
Ovis-U1 has been tested with Python 3.10, Torch 2.4.0, Transformers 4.51.3, and DeepSpeed 0.15.4. For a comprehensive list of package dependencies, please consult the requirements.txt file.
git clone git@github.com:AIDC-AI/Ovis-U1.git
conda create -n ovis-u1 python=3.10 -y
conda activate ovis-u1
cd Ovis-U1
pip install -r requirements.txt
pip install -e .
๐ ๏ธ Inference
For single image understanding, please run
python test_img_to_txt.py
For multi-image understanding, please run
python test_multi_img_to_txt.py
For text-to-image, please run
python test_txt_to_img.py \
--height 1024 \
--width 1024 \
--steps 50 \
--seed 42 \
--txt_cfg 5
For image editing, please run
python test_img_edit.py \
--steps 50 \
--img_cfg 1.5 \
--txt_cfg 6
๐ Performance OpenCompass Multi-modal Academic Benchmarks Model Avg MMB MMS MMMU MathVista Hallusion AI2D OCRBench MMVet GPT-4o 75.4 86 70.2 72.9 71.6 57 86.3 82.2 76.9 InternVL2.5-2B 59.9 70.9 54.3 43.2 51.1 42.3 74.9 80.2 62.6 SAIL-VL-2B 61 73.7 56.5 44.1 62.8 45.9 77.4 83.1 44.2 InternVL3-2B 61.1 78 61.1 48.7 57.6 41.9 78.6 83.1 67 Qwen2.5-VL-3B 64.5 76.8 56.3 51.2 61.2 46.6 81.4 82.8 60 Ovis2-2B 65.2 76.9 56.7 45.6 64.1 50.2 82.7 87.3 58.3 SAIL-VL-1.5-2B 67 78.5 62.6 46.4 67 50 83.7 89.1 58.8 Ristretto-3B 67.7 80.2 62.8 51.3 67.6 50.2 84.2 84.7 60.7 Ovis-U1 69.6 77.8 61.3 51.1 69.4 56.3 85.6 88.3 66.7 GenEval Model Overall Single object Two object Counting Colors Position Attribute binding GPT-4o 0.84 0.99 0.92 0.85 0.92 0.75 0.61 BAGEL 0.82 0.99 0.94 0.81 0.88 0.64 0.63 BAGEL ๐ 0.88 0.98 0.95 0.84 0.95 0.78 0.77 UniWorld-V1 0.80 0.99 0.93 0.79 0.89 0.49 0.70 UniWorld-V1 ๐ 0.84 0.98 0.93 0.81 0.89 0.74 0.71 OmniGen 0.68 0.98 0.84 0.66 0.74 0.40 0.43 OmniGen2 0.80 1 0.95 0.64 0.88 0.55 0.76 OmniGen2 ๐ 0.86 0.99 0.96 0.74 0.98 0.71 0.75 Ovis-U1 0.89 0.98 0.98 0.90 0.92 0.79 0.75
๐ denotes using the rewritten prompts
DPG-Bench Model Overall Global Entity Attribute Relation Other BAGEL 85.07 88.94 90.37 91.29 90.82 88.67 UniWorld-V1 81.38 83.64 88.39 88.44 89.27 87.22 OmniGen 81.16 87.90 88.97 88.47 87.95 83.56 OmniGen2 83.57 88.81 88.83 90.18 89.37 90.27 Ovis-U1 83.72 82.37 90.08 88.68 93.35 85.20 ImgEdit-Bench Model Overall Add Adjust Extract Replace Remove Background Style Hybrid Action GPT-4o 4.2 4.61 4.33 2.9 4.35 3.66 4.57 4.93 3.96 4.89 MagicBrush 1.90 2.84 1.58 1.51 1.97 1.58 1.75 2.38 1.62 1.22 Instruct-P2P 1.88 2.45 1.83 1.44 2.01 1.50 1.44 3.55 1.2 1.46 AnyEdit 2.45 3.18 2.95 1.88 2.47 2.23 2.24 2.85 1.56 2.65 UltraEdit 2.7 3.44 2.81 2.13 2.96 1.45 2.83 3.76 1.91 2.98 OmniGen 2.96 3.47 3.04 1.71 2.94 2.43 3.21 4.19 2.24 3.38 Step1X-Edit 3.06 3.88 3.14 1.76 3.40 2.41 3.16 4.63 2.64 2.52 ICEdit 3.05 3.58 3.39 1.73 3.15 2.93 3.08 3.84 2.04 3.68 BAGEL 3.2 3.56 3.31 1.7 3.3 2.62 3.24 4.49 2.38 4.17 UniWorld-V1 3.26 3.82 3.64 2.27 3.47 3.24 2.99 4.21 2.96 2.74 OmniGen2 3.44 3.57 3.06 1.77 3.74 3.2 3.57 4.81 2.52 4.68 Ovis-U1 4.00 4.13 3.62 2.98 4.45 4.06 4.22 4.69 3.45 4.61 GEdit-Bench-EN Model Avg Background Change Color Alteration Material Modification Motion Change Portrait Beautification Style Transfer Subject Addition Subject Removal Subject Replacement Text Modification Tone Transformation GPT-4o 7.534 7.205 6.491 6.607 8.096 7.768 6.961 7.622 8.331 8.067 7.427 8.301 AnyEdit 3.212 4.663 4.260 2.537 2.024 3.479 2.032 3.995 3.089 3.180 0.922 5.151 Instruct-Pix2Pix 3.684 3.825 5.182 3.688 3.509 4.339 4.560 3.461 2.031 4.237 0.955 4.733 MagicBrush 4.518 5.637 5.136 5.078 4.513 4.487 4.439 5.252 3.704 4.941 1.384 5.130 OmniGen 5.062 5.281 6.003 5.308 2.916 3.087 4.903 6.628 6.352 5.616 4.519 5.064 Gemini 6.315 6.781 6.369 6.040 6.938 5.591 4.676 7.501 6.447 7.003 5.765 6.350 Step1X-Edit 6.701 6.547 6.545 6.204 6.483 6.787 7.221 6.975 6.512 7.068 6.921 6.448 Doubao 6.754 7.430 7.095 6.339 6.973 6.972 6.767 7.674 6.748 7.447 3.471 7.383 BAGEL 6.519 7.324 6.909 6.381 4.753 4.573 6.150 7.896 7.164 7.021 7.320 6.218 Ovis-U1 6.420 7.486 6.879 6.208 4.790 5.981 6.463 7.491 7.254 7.266 4.482 6.314 ๐ CitationIf you find Ovis-U1 useful, please cite our paper:
@article{wang2025ovisu1,
title={Ovis-U1 Technical Report},
author={Wang, Guo-Hua and Zhao, Shanshan and Zhang, Xinjie and Cao, Liangfu and Zhan, Pengxin and Duan, Lunhao and Lu, Shiyin and Fu, Minghao and Zhao, Jianshan and Li, Yang and Chen, Qing-Guo},
journal={arXiv preprint arXiv:2506.23044},
year={2025}
}
๐ Acknowledgments
The code is built upon Ovis and FLUX.
๐ LicenseThe project is released under Apache License 2.0 (http://www.apache.org/licenses/LICENSE-2.0, SPDX-License-identifier: Apache-2.0).
๐จ DisclaimerWe used compliance checking algorithms during the training process, to ensure the compliance of the trained model to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4