All benchmarks are wrong, some will cost you less than others.
Optimum-Benchmark is a unified multi-backend & multi-device utility for benchmarking Transformers, Diffusers, PEFT, TIMM and Optimum libraries, along with all their supported optimizations & quantization schemes, for inference & training, in distributed & non-distributed settings, in the most correct, efficient and scalable way possible.
News ๐ฐ
llama-cpp-python
bindings with all its supported devices ๐pip install optimum-benchmark
๐ check it out !cpu
, cuda
, rocm
) in packages for testing, benchmarking and reproducibility ๐ณMotivations ๐ฏ
Note
Optimum-Benchmark is a work in progress and is not yet ready for production use, but we're working hard to make it so. Please keep an eye on the project and help us improve it and make it more useful for the community. We're looking forward to your feedback and contributions. ๐
Optimum-Benchmark is continuously and intensively tested on a variety of devices, backends, scenarios and launchers to ensure its stability with over 300 tests running on every PR (you can request more tests if you want to).
You can install the latest released version of optimum-benchmark
on PyPI:
pip install optimum-benchmark
or you can install the latest version from the main branch on GitHub:
pip install git+https://github.com/huggingface/optimum-benchmark.git
or if you want to tinker with the code, you can clone the repository and install it in editable mode:
git clone https://github.com/huggingface/optimum-benchmark.git cd optimum-benchmark pip install -e .Advanced install options
Depending on the backends you want to use, you can install optimum-benchmark
with the following extras:
pip install optimum-benchmark
pip install optimum-benchmark[openvino]
pip install optimum-benchmark[onnxruntime]
pip install optimum-benchmark[tensorrt-llm]
pip install optimum-benchmark[onnxruntime-gpu]
pip install optimum-benchmark[py-txi]
pip install optimum-benchmark[vllm]
pip install optimum-benchmark[ipex]
We also support the following extra extra dependencies:
You can run benchmarks from the Python API, using the Benchmark
class and its launch
method. It takes a BenchmarkConfig
object as input, runs the benchmark in an isolated process and returns a BenchmarkReport
object containing the benchmark results.
Here's an example of how to run an isolated benchmark using the pytorch
backend, torchrun
launcher and inference
scenario with latency and memory tracking enabled.
from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig from optimum_benchmark.logging_utils import setup_logging setup_logging(level="INFO", handlers=["console"]) if __name__ == "__main__": launcher_config = TorchrunConfig(nproc_per_node=2) scenario_config = InferenceConfig(latency=True, memory=True) backend_config = PyTorchConfig(model="gpt2", device="cuda", device_ids="0,1", no_weights=True) benchmark_config = BenchmarkConfig( name="pytorch_gpt2", scenario=scenario_config, launcher=launcher_config, backend=backend_config, ) benchmark_report = Benchmark.launch(benchmark_config) # convert artifacts to a dictionary or dataframe benchmark_config.to_dict() # or benchmark_config.to_dataframe() # save artifacts to disk as json or csv files benchmark_report.save_csv("benchmark_report.csv") # or benchmark_report.save_json("benchmark_report.json") # push artifacts to the hub benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2") # or benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2") # or merge them into a single artifact benchmark = Benchmark(config=benchmark_config, report=benchmark_report) benchmark.save_json("benchmark.json") # or benchmark.save_csv("benchmark.csv") benchmark.push_to_hub("IlyasMoutawwakil/pytorch_gpt2") # load artifacts from the hub benchmark = Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2") # or Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2") # or load them from disk benchmark = Benchmark.load_json("benchmark.json") # or Benchmark.load_csv("benchmark_report.csv")
If you're on VSCode, you can hover over the configuration classes to see the available parameters and their descriptions. You can also see the available parameters in the Features section below.
Running benchmarks using the Hydra CLI ๐งชYou can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for hydra
. --config-dir
is the directory where the configuration files are stored and --config-name
is the name of the configuration file without its .yaml
extension.
optimum-benchmark --config-dir examples/ --config-name cuda_pytorch_bert
This will run the benchmark using the configuration in examples/cuda_pytorch_bert.yaml
and store the results in runs/cuda_pytorch_bert
.
The resulting files are :
benchmark_config.json
which contains the configuration used for the benchmark, including the backend, launcher, scenario and the environment in which the benchmark was run.benchmark_report.json
which contains a full report of the benchmark's results, like latency measurements, memory usage, energy consumption, etc.benchmark_report.txt
which contains a detailed report of the benchmark's results, in the same format they were logged.benchmark_report.md
which contains a detailed report of the benchmark's results, in markdown format.benchmark.json
contains both the report and the configuration in a single file.benchmark.log
contains the logs of the benchmark run.It's easy to override the default behavior of a benchmark from the command line of an already existing configuration file. For example, to run the same benchmark on a different device, you can use the following command:
optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda
You can easily run configuration sweeps using the --multirun
option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins (e.g. hydra/launcher=joblib
).
optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cudaConfigurations structure ๐
You can create custom and more complex configuration files following these examples. They are heavily commented to help you understand the structure of the configuration files.
optimum-benchmark
allows you to run benchmarks with minimal configuration. A benchmark is defined by three main components:
process
)training
)onnxruntime
)launcher=process
); Launches the benchmark in an isolated process.launcher=torchrun
); Launches the benchmark in multiples processes using torch.distributed
.launcher=inline
), not recommended for benchmarking, only for debugging purposes.launcher.device_isolation=true
). This feature makes sure no other processes are running on the targeted GPU devices other than the benchmark. Espepecially useful when running benchmarks on shared resources.scenario=training
) which benchmarks the model using the trainer class with a randomly generated dataset.scenario=inference
) which benchmakrs the model's inference method (forward/call/generate) with randomly generated inputs.scenario.memory=true
)scenario.energy=true
)scenario.latency=true
)scenario.warmup_runs=20
)scenario.input_shapes.sequence_length=128
)scenario.generate_kwargs.max_new_tokens=100
, for a diffusion model scenario.call_kwargs.num_images_per_prompt=4
)See InferenceConfig for more information.
Training scenario features ๐งฐscenario.memory=true
)scenario.energy=true
)scenario.latency=true
)scenario.warmup_steps=20
)scenario.dataset_shapes.sequence_length=128
)scenario.training_args.per_device_train_batch_size=4
)See TrainingConfig for more information.
backend=pytorch
, backend.device=cpu
)backend=pytorch
, backend.device=cuda
, backend.device_ids=0,1
)backend=pytorch
, backend.device=hpu
, backend.device_ids=0,1
)backend=onnxruntime
, backend.device=cpu
)backend=onnxruntime
, backend.device=cuda
)backend=onnxruntime
, backend.device=cuda
, backend.provider=ROCMExecutionProvider
)backend=onnxruntime
, backend.device=cuda
, backend.provider=TensorrtExecutionProvider
)backend=py-txi
, backend.device=cpu
or backend.device=cuda
)backend=neural-compressor
, backend.device=cpu
)backend=tensorrt-llm
, backend.device=cuda
)backend=openvino
, backend.device=cpu
)backend=openvino
, backend.device=gpu
)backend=vllm
, backend.device=cuda
)backend=vllm
, backend.device=rocm
)backend=vllm
, backend.device=cpu
)backend=ipex
, backend.device=cpu
)backend=ipex
, backend.device=xpu
)backend.device=cuda
), can be cpu
, cuda
, mps
, etc.backend.device_ids=0,1
), can be a list of device ids to run the benchmark on multiple devices.backend.model=gpt2
), can be a model id from the HuggingFace model hub or an absolute path to a model folder.backend.no_weights=true
)For more information on the features of each backend, you can check their respective configuration files:
Contributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request. Things that we'd like to see:
To get started, you can check the CONTRIBUTING.md file.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4