A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/microsoft/Olive/ below:

microsoft/Olive: Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.

Given a model and targeted hardware, Olive (abbreviation of Onnx LIVE) composes the best suitable optimization techniques to output the most efficient ONNX model(s) for inferencing on the cloud or edge, while taking a set of constraints such as accuracy and latency into consideration.

✅ Benefits of using Olive

Here are some recent videos, blog articles and labs that highlight Olive:

For a full list of news and blogs, read the news archive.

The following notebooks are available that demonstrate key optimization workflows with Olive and include the application code to inference the optimized models on the ONNX Runtime.

Title Task Description Time Required Notebook Links Quickstart Text Generation Learn how to quantize & optimize an SLM for the ONNX Runtime using a single Olive command. 5mins Download / Open in Colab Optimizing popular SLMs Text Generation Choose from a curated list of over 20 popular SLMs to quantize & optimize for the ONNX runtime. 5mins Download / Open in Colab How to finetune models for on-device inference Text Generation Learn how to Quantize (using AWQ method), fine-tune, and optimize an SLM for on-device inference. 15mins Download / Open in Colab Finetune and Optimize DeepSeek R1 with Olive Text Generation Learn how to Finetune and Optimize DeepSeek-R1-Distill-Qwen-1.5B for on-device inference. 15mins Download / Open in Colab

If you prefer using the command line directly instead of Jupyter notebooks, we've outlined the quickstart commands here.

We recommend installing Olive in a virtual environment or a conda environment.

pip install olive-ai[auto-opt]
pip install transformers onnxruntime-genai

Note

Olive has optional dependencies that can be installed to enable additional features. Please refer to Olive package config for the list of extras and their dependencies.

In this quickstart you'll be optimizing Qwen/Qwen2.5-0.5B-Instruct, which has many model files in the Hugging Face repo for different precisions that are not required by Olive.

Run the automatic optimization:

olive optimize \
    --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --precision int4 \
    --output_path models/qwen

Tip

PowerShell Users Line continuation between Bash and PowerShell are not interchangable. If you are using PowerShell, then you can copy-and-paste the following command that uses compatible line continuation.
olive optimize `
   --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct `
   --output_path models/qwen `
   --precision int4

The automatic optimizer will:

  1. Acquire the model from the the Hugging Face model repo.
  2. Quantize the model to int4 using GPTQ.
  3. Capture the ONNX Graph and store the weights in an ONNX data file.
  4. Optimize the ONNX Graph.

Olive can automatically optimize popular model architectures like Llama, Phi, Qwen, Gemma, etc out-of-the-box - see detailed list here. Also, you can optimize other model architectures by providing details on the input/outputs of the model (io_config).

3. Inference on the ONNX Runtime

The ONNX Runtime (ORT) is a fast and light-weight cross-platform inference engine with bindings for popular programming language such as Python, C/C++, C#, Java, JavaScript, etc. ORT enables you to infuse AI models into your applications so that inference is handled on-device.

The sample chat app to run is found as model-chat.py in the onnxruntime-genai Github repository.

🤝 Contributions and Feedback

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4