A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developer.mozilla.org/en-US/blog/build-ai-powered-apps-openllm-vultr-gpu/ below:

Build AI-powered applications using OpenLLM and Vultr Cloud GPU

Build AI-powered applications using OpenLLM and Vultr Cloud GPU

OpenLLM is an open-source platform that enables you to create AI-powered production applications such as personalized chat-bots, recommendation systems, and more. It allows you to generate responses by sending an API prompt, along with parameters, to manipulate the response. OpenLLM's library includes all major models such as Mistral, Falcon, and Llama.

In this article, we'll demonstrate the steps to deploy the Falcon 7B model with OpenLLM on a Vultr GPU server to generate API responses, which can be used for creating AI-powered applications. You'll learn how to install the required dependencies and how to create an OpenLLM service with persistence. Additionally, we'll cover setting up Nginx as a reverse proxy for efficient load balancing and securing your application with a Secure Sockets Layer (SSL) certificate to enable HTTPS.

Deploying a server on Vultr

To efficiently deploy Artificial Intelligence (AI) or Machine Learning (ML)-powered applications, using cloud GPUs is one of the most efficient ways. Cloud GPUs provide access to the latest technology, enabling you to continuously build, deploy, and globally serve large-scale applications.

  1. Sign up and log in to the Vultr Customer Portal.

  2. Navigate to the Products page.

  3. From the side menu, select Compute.

  4. Click the Deploy Server button in the center.

  5. Select Cloud GPU as the server type.

  6. Select A40 as the GPU type.

  7. In the "Server Location" section, select the region of your choice.

  8. In the "Operating System" section, select Vultr GPU Stack as the operating system.

    Vultr GPU Stack is designed to streamline the process of building Artificial Intelligence (AI) and Machine Learning (ML) projects by providing a comprehensive suite of pre-installed software, including NVIDIA CUDA Toolkit, NVIDIA cuDNN, Tensorflow, PyTorch, and so on.

  9. In the "Server Size" section, select the 48 GB option.

  10. Select any more features as required in the "Additional Features" section.

  11. Click the Deploy Now button on the bottom right corner.

Installing the required packages

After setting up a Vultr server as described earlier, this section will guide you through installing the dependency Python packages necessary for running OpenLLM and verifying the installation.

  1. Install the required packages.

    pip3 install openllm scipy xformers einops
    

    Here's what each package represents:

  2. Verify the installation.

    If the installation is successful, the system will find and execute openllm, displaying its help information. This indicates that openllm is correctly installed and recognized by the system. If openllm is not installed properly, the command will likely return an error.

Creating an OpenLLM service

In this section, you'll learn how to create an OpenLLM service that starts the service automatically when the system boots up and runs the Falcon 7B model for inference.

  1. Get the openllm path.

  2. Copy and paste the path into the clipboard. You'll use this in Step 4.

  3. Create an OpenLLM service file.

    nano /etc/systemd/system/openllm.service
    
  4. Paste the following content into the service file. Make sure to replace User and Group values with your actual values. Also replace WorkingDirectory with the OpenLLM path (without openllm) you copied earlier and Execstart with the actual OpenLLM path including the executable binary.

     [Unit]
     Description= Daemon for OpenLLM Demo Application
     After=network.target
    
     [Service]
     User=example_user
     Group=example_user
     WorkingDirectory=/home/example_user/.local/bin/
     ExecStart=/home/example_user/.local/bin/openllm start tiiuae/falcon-7b --backend pt
    
     [Install]
     WantedBy=multi-user.target
    
  5. Start the service.

  6. Verify the status of the service.

    The output will look like this:

    ● openllm.service - Daemon for OpenLLM Demo Application
     Loaded: loaded (/etc/systemd/system/openllm.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-11-29 20:51:25 UTC; 12min ago
    Main PID: 3160 (openllm)
      Tasks: 257 (limit: 72213)
     Memory: 21.9G
     CGroup: /system.slice/openllm.service
             ├─3160 /usr/bin/python3 /usr/local/bin/openllm start tiiuae/falcon-7b --backend pt
    
  7. Enable the service to start automatically whenever the system boots up.

Configuring Nginx as a reverse proxy server

Nginx acts as a reverse proxy between your web server and clients. It directs incoming requests based on your request configuration settings.

In this section, you'll learn how to configure your OpenLLM application for reverse proxy for efficient request handling and load balancing using Nginx. You'll also learn how to configure OpenLLM itself for reverse proxy functionality.

  1. Log in to the Vultr Customer Portal.

  2. Navigate to the Products page.

  3. From the side menu, expand the Network drop down, and select DNS.

  4. Click the Add Domain button in the center.

  5. Follow the setup procedure to add your domain name by selecting the IP address of your server.

  6. Set the following hostnames as your domain's primary and secondary nameservers with your domain registrar.

  7. Install Nginx.

  8. Create a file named openllm.conf in the sites-available directory.

    sudo nano /etc/nginx/sites-available/openllm.conf
    
  9. Paste the following content into the openllm.conf file. Make sure to replace example.com with your actual domain name.

    server {
            listen 80;
            listen [::]:80;
            server_name example.com www.example.com;
    
    
            location / {
                proxy_pass http://127.0.0.1:3000/;
            }
       }
    

    The following directives are used in the above virtual host configuration:

  10. Save the file, and exit the editor.

  11. Activate the virtual host configuration by linking the openllm.conf file to the sites-enabled directory.

    sudo ln -s /etc/nginx/sites-available/openllm.conf /etc/nginx/sites-enabled/
    
  12. Test the configuration to identify the errors.

    If the configuration has no errors, your output should look like this:

    nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
    nginx: configuration file /etc/nginx/nginx.conf test is successful
    
  13. Restart the Nginx server.

    sudo systemctl reload nginx
    
  14. Allow incoming connections on port 80.

Installing an SSL certificate using Certbot

Certbot allows you to acquire SSL certificates from "Let's Encrypt", a free certificate authority. These SSL certificates serve as cryptographic keys that enable secure communication between a user and a web server.

In this section, you'll learn how to request a free SSL certification from "Let's Encrypt" for your domain and implement HTTPS for your application. You'll also learn how to set the certificate to auto-renew before it expires in 90 days.

  1. Allow incoming connections on port 443 for HTTPS.

  2. Install the certbot package using the snap package manager.

    sudo snap install --classic certbot
    
  3. Request a new SSL certificate for your domain. Make sure to replace example.com with your actual domain name.

    sudo certbot --nginx -d example.com -d www.example.com
    
  4. You can visit the OpenLLM API documentation at the following link:

Generating an API response using OpenLLM

Having configured Nginx and SSL, this section will guide you through sending an API POST request to the OpenLLM endpoint that's responsible for generating a response from the given prompts.

Send a curl request to the API endpoint.

curl -X POST -H "Content-Type: application/json" -d '{
    "prompt": "What is the meaning of life?",
    "stop": ["\n"],
    "llm_config": {
        "max_new_tokens": 128,
        "min_length": 0,
        "early_stopping": false,
        "num_beams": 1,
        "num_beam_groups": 1,
        "use_cache": true,
        "temperature": 0.75,
        "top_k": 15,
        "top_p": 0.9,
        "typical_p": 1,
        "epsilon_cutoff": 0,
        "eta_cutoff": 0,
        "diversity_penalty": 0,
        "repetition_penalty": 1,
        "encoder_repetition_penalty": 1,
        "length_penalty": 1,
        "no_repeat_ngram_size": 0,
        "renormalize_logits": false,
        "remove_invalid_values": false,
        "num_return_sequences": 1,
        "output_attentions": false,
        "output_hidden_states": false,
        "output_scores": false,
        "encoder_no_repeat_ngram_size": 0,
        "logprobs": 0,
        "prompt_logprobs": 0,
        "n": 1,
        "presence_penalty": 0,
        "frequency_penalty": 0,
        "use_beam_search": false,
        "ignore_eos": false,
        "skip_special_tokens": true
    },
    "adapter_name": null
}' https://example.com/v1/generate

You can adjust the intensity of the response by changing the values of various parameters. Here's an explanation of what each parameter does:

This is a sample output of the curl request:

{
  "prompt": "What is the meaning of life?",
  "finished": true,
  "outputs": [
    {
      "index": 0,
      "text": " What is the meaning of the universe? How does the universe work?",
      "token_ids": [
        1634, 304, 248, 4113, 275, 248, 10314, 42, 1265, 960, 248, 10314, 633,
        42, 193, 1265, 960, 248, 10314, 633, 42, 193
      ],
      "cumulative_logprob": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "prompt_token_ids": [1562, 304, 248, 4113, 275, 1063, 42],
  "prompt_logprobs": null,
  "request_id": "openllm-e1b145f3e9614624975f76e7fae6050c"
}
Summary

In this article, you learned how to build API responses for AI-powered apps using OpenLLM and Vultr GPU stack. This tutorial guided you through the steps to create an OpenLLM service that initializes the required model and API endpoint necessary to generate responses. You also learned how to set up Nginx as a reverse proxy server for your OpenLLM service and secure it with an SSL certificate.

This is a sponsored article by Vultr. Vultr is the world's largest privately-held cloud computing platform. A favorite with developers, Vultr has served over 1.5 million customers across 185 countries with flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. Learn more about Vultr.

Previous Post Saying goodbye to third-party cookies in 2024 Next Post Border images in CSS: A key focus area for Interop 2023

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4