NVIDIA NIM for large language models (LLMs) supports serving models in an air gap system (also known as air wall, air-gapping, or disconnected network). In an air gap system, you can run a NIM with no internet connection, and with no connection to the NGC registry or HuggingFace Hub.
Before you use this documentation, review all prerequisites and instructions in Get Started with NIM and see Serving models from local assets.
Refer to the appropriate section based on your NIM:
Air Gap Deployment for Multi-LLM NIMs# Local Model Directory Option#Use this option to deploy the created model repository using the create-model-store
command within the NIM container to create a repository for a single model, as shown in the following example. A HuggingFace access token (HF_TOKEN
) is required to run the tool that creates the model store.
# Choose a container name for bookkeeping export CONTAINER_NAME=llm-nim # Choose the multi-LLM NIM image from NGC export IMG_NAME="nvcr.io/nim/nvidia/llm-nim:1.13.0" # Choose a path on your system to cache the downloaded models export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" export MODEL_REPO=/path/to/model-repository # Provide write permissions to model-repo for the user chown -R $(id -u) $MODEL_REPO export NIM_SERVED_MODEL_NAME=my-model # HuggingFace model repository export NIM_MODEL_NAME=hf://nvidia/Llama-3.1-Nemotron-Nano-8B-v1Create the Model Store#
The following command creates the model store in the location specified by MODEL_REPO
:
docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus all \ --shm-size=16GB \ -e NIM_SERVED_MODEL_NAME \ -v $MODEL_REPO:/model-repo \ -e HF_TOKEN \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAME create-model-store --model-repo $NIM_MODEL_NAME --model-store /model-repo
Now run the following docker command in an air-gap environment. Donât set HF_TOKEN
(as shown in the following):
docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus all \ --shm-size=16GB \ -e NIM_MODEL_NAME=/model-repo \ -e NIM_SERVED_MODEL_NAME \ -v $MODEL_REPO:/model-repo \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAMEAir Gap Deployment for LLM-specific NIMs# Offline Cache Option#
If NIM detects a previously loaded profile in the cache, it serves that profile from the cache. After downloading the profiles to cache by using download-to-cache
, you can transfer the cache to an air-gapped system to run a NIM without any internet connection and with no connection to the NGC registry.
download-to-cache -p 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b
Do NOT provide the NGC_API_KEY
(after running download-to-cache
), as shown in the following example.
# Create an example air-gapped directory where the downloaded NIM will be deployed export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache mkdir -p "$AIR_GAP_NIM_CACHE" # Transport the downloaded NIM to an air-gapped directory cp -r "$LOCAL_NIM_CACHE"/* "$AIR_GAP_NIM_CACHE" # Choose a container name for bookkeeping export CONTAINER_NAME=Llama-3.1-8B-instruct # The container name from the previous ngc registry image list command Repository=nim/meta/llama-3.1-8b-instruct # Choose an LLM NIM Image from NGC export IMG_NAME="nvcr.io/${Repository}:1.13.0" # Assuming the command run prior was `download-to-cache`, downloading the optimal profile docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus all \ --shm-size=16GB \ -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAME # Assuming the command run prior was `download-to-cache --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b` docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus all \ --shm-size=16GB \ -e NIM_MODEL_PROFILE=09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b \ -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAMELocal Model Directory Option#
Use this option to deploy the created model repository using the create-model-store
command within the NIM container to create a repository for a single model, as shown in the following example.
# provide write permissions to model-repo for the user chown -R $(id -u) $MODEL_REPO # ensure model-store should have permission to write insdie the container create-model-store --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b --model-store /model-repo
Do NOT provide the NGC_API_KEY
(after running create-model-store
), as shown in the following example.
# Choose a container name for bookkeeping export CONTAINER_NAME=Llama-3.1-8B-instruct # The container name from the previous ngc registry image list command Repository=nim/meta/llama-3.1-8b-instruct # Choose an LLM NIM Image from NGC export IMG_NAME="nvcr.io/${Repository}:1.13.0" # Choose a path on your system to cache the downloaded models export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" export MODEL_REPO=/path/to/model-repository export NIM_SERVED_MODEL_NAME=my-model
Note
When using create-model-store
with vLLM profiles, set the NIM_MODEL_PROFILE
environment variable to vllm
. For SGLANG profiles, set it to sglang
. For TRTLLM-buildable profiles, set it to tensorrt_llm
. Note that NIM will pick tensorrt_llm
profiles automatically for TRTLLM pre-built engine profiles.
Configure the other required environment variables as shown in the following example:
docker run -it --rm --name=$CONTAINER_NAME \ --runtime=nvidia \ --gpus all \ --shm-size=16GB \ -e NIM_MODEL_NAME=/model-repo \ -e NIM_SERVED_MODEL_NAME \ -v $MODEL_REPO:/model-repo \ -u $(id -u) \ -p 8000:8000 \ $IMG_NAME
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4