RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://docs.nvidia.com/nim/large-language-models/latest/deploy-air-gap.html below:

Air Gap Deployment for NVIDIA NIM for LLMs — NVIDIA NIM for Large Language Models (LLMs)

Air Gap Deployment for NVIDIA NIM for LLMs#

NVIDIA NIM for large language models (LLMs) supports serving models in an air gap system (also known as air wall, air-gapping, or disconnected network). In an air gap system, you can run a NIM with no internet connection, and with no connection to the NGC registry or HuggingFace Hub.

Before you use this documentation, review all prerequisites and instructions in Get Started with NIM and see Serving models from local assets.

Refer to the appropriate section based on your NIM:

Air Gap Deployment for Multi-LLM NIMs# Local Model Directory Option#

Use this option to deploy the created model repository using the create-model-store command within the NIM container to create a repository for a single model, as shown in the following example. A HuggingFace access token (HF_TOKEN) is required to run the tool that creates the model store.

Initialize Container Setup#

# Choose a container name for bookkeeping
export CONTAINER_NAME=llm-nim

# Choose the multi-LLM NIM image from NGC
export IMG_NAME="nvcr.io/nim/nvidia/llm-nim:1.13.0"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

export MODEL_REPO=/path/to/model-repository
# Provide write permissions to model-repo for the user
chown -R $(id -u) $MODEL_REPO
export NIM_SERVED_MODEL_NAME=my-model
# HuggingFace model repository
export NIM_MODEL_NAME=hf://nvidia/Llama-3.1-Nemotron-Nano-8B-v1

Create the Model Store#

The following command creates the model store in the location specified by MODEL_REPO:

docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_SERVED_MODEL_NAME \
  -v $MODEL_REPO:/model-repo \
  -e HF_TOKEN \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME create-model-store --model-repo $NIM_MODEL_NAME --model-store /model-repo

Now run the following docker command in an air-gap environment. Donât set HF_TOKEN (as shown in the following):

docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_MODEL_NAME=/model-repo \
  -e NIM_SERVED_MODEL_NAME \
  -v $MODEL_REPO:/model-repo \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

Air Gap Deployment for LLM-specific NIMs# Offline Cache Option#

If NIM detects a previously loaded profile in the cache, it serves that profile from the cache. After downloading the profiles to cache by using download-to-cache, you can transfer the cache to an air-gapped system to run a NIM without any internet connection and with no connection to the NGC registry.

download-to-cache -p 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b

Do NOT provide the NGC_API_KEY (after running download-to-cache), as shown in the following example.

# Create an example air-gapped directory where the downloaded NIM will be deployed
export AIR_GAP_NIM_CACHE=~/.cache/air-gap-nim-cache
mkdir -p "$AIR_GAP_NIM_CACHE"

# Transport the downloaded NIM to an air-gapped directory
cp -r "$LOCAL_NIM_CACHE"/* "$AIR_GAP_NIM_CACHE"

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct

# The container name from the previous ngc registry image list command
Repository=nim/meta/llama-3.1-8b-instruct

# Choose an LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:1.13.0"

# Assuming the command run prior was `download-to-cache`, downloading the optimal profile
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

# Assuming the command run prior was `download-to-cache --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b`
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_MODEL_PROFILE=09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b \
  -v "$AIR_GAP_NIM_CACHE:/opt/nim/.cache" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

Local Model Directory Option#

Use this option to deploy the created model repository using the create-model-store command within the NIM container to create a repository for a single model, as shown in the following example.

# provide write permissions to model-repo for the user
chown -R $(id -u) $MODEL_REPO
# ensure model-store should have permission to write insdie the container
create-model-store --profile 09e2f8e68f78ce94bf79d15b40a21333cea5d09dbe01ede63f6c957f4fcfab7b --model-store /model-repo

Do NOT provide the NGC_API_KEY (after running create-model-store), as shown in the following example.

# Choose a container name for bookkeeping
export CONTAINER_NAME=Llama-3.1-8B-instruct

# The container name from the previous ngc registry image list command
Repository=nim/meta/llama-3.1-8b-instruct

# Choose an LLM NIM Image from NGC
export IMG_NAME="nvcr.io/${Repository}:1.13.0"

# Choose a path on your system to cache the downloaded models
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

export MODEL_REPO=/path/to/model-repository
export NIM_SERVED_MODEL_NAME=my-model

Note

When using create-model-store with vLLM profiles, set the NIM_MODEL_PROFILE environment variable to vllm. For SGLANG profiles, set it to sglang. For TRTLLM-buildable profiles, set it to tensorrt_llm. Note that NIM will pick tensorrt_llm profiles automatically for TRTLLM pre-built engine profiles.

Configure the other required environment variables as shown in the following example:

docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=16GB \
  -e NIM_MODEL_NAME=/model-repo \
  -e NIM_SERVED_MODEL_NAME \
  -v $MODEL_REPO:/model-repo \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4