To get started, make sure you have the following installed on your system:
Python 3.x (preferably 3.11) with pip
Note
Prefer a video guide? Watch the step-by-step tutorial on YouTube
Warning
CUDA and ROCm aren't prerequisites because torch can install them for you. However, if this doesn't work (ex. DLL load failed), install the CUDA toolkit or ROCm on your system.
Warning
Sometimes there may be an error with Windows that VS build tools needs to be installed. This means that there's a package that isn't supported for your python version. You can install VS build tools 17.8 and build the wheel locally. In addition, open an issue stating that a dependency is building a wheel.
Clone this repository to your machine: git clone https://github.com/theroyallab/tabbyAPI
Navigate to the project directory: cd tabbyAPI
Run the appropriate start script (start.bat
for Windows and start.sh
for linux).
The API should start with no model loaded. Please read more to see how to download a model.
python -m venv venv
.\venv\Scripts\activate
source venv/bin/activate
pip install -U .[cu121]
pip install -U .[amd]
start.bat/sh
. The script will check if you're in a conda environment and skip venv checks.python main.py
to start the API. This won't automatically upgrade your dependencies.TabbyAPI includes a built-in Hugging Face downloader that works via both the API and terminal. You can use the following command to download a repository with a specific branch revision:
.\Start.bat download <repo name> --revision <branch>
Example with Turboderp's Llama 3.1 8B quants:
.\Start.bat download turboderp/Qwen2.5-VL-7B-Instruct-exl2 --revision 4.0bpw
If a model is gated, you can provide a HuggingFace access token (most exl2 quants aren't private):
.\Start.bat download meta-llama/Llama-3.1-8B --token <token>
Alternatively, running main.py
directly can also trigger the downloader. For additional options, run .\Start.bat download --help
Loading solely the API may not be your optimal usecase. Therefore, a config.yml exists to tune initial launch parameters and other configuration options.
A config.yml file is required for overriding project defaults. If you are okay with the defaults, you don't need a config file!
If you do want a config file, copy over config_sample.yml
to config.yml
. All the fields are commented, so make sure to read the descriptions and comment out or remove fields that you don't need.
In addition, if you want to manually set the API keys, copy over api_keys_sample.yml
to api_keys.yml
and fill in the fields. However, doing this is less secure and autogenerated keys should be used instead.
You can also access the configuration parameters under 2. Configuration in this wiki!
There are a couple ways to update TabbyAPI:
update_deps
: Updates dependencies to their latest versions.update_deps_and_pull
: Updates dependencies and pulls the latest commit of the Github repository.These scripts exit after running their respective tasks. To start TabbyAPI, run start.bat
or start.sh
.
pip install -U .[cu121]
= CUDA 12.xpip install -U .[amd]
= ROCm 6.0If you don't want to update dependencies that come from wheels (torch, exllamav2, and flash attention 2), use pip install .
or pass the --nowheel
flag when invoking the start scripts.
Warning
These instructions are meant for advanced users.
Important
If you're installing a custom Exllamav2 wheel, make sure to use pip install .
when updating! Otherwise, each update will overwrite your custom exllamav2 version.
NOTE:
pyproject.toml
locally, create an issue or PR, or install your version of exllamav2 after upgrades.Here are ways to install exllamav2:
cu121
and cp311
corresponds to CUDA 12.1 and python 3.11pip install exllamav2
2. This is a JIT compiled extension, which means that the initial launch of tabbyAPI will take some time. The build may also not work due to improper environment configuration.These are short-form instructions for other methods that users can use to install TabbyAPI.
Warning
Using methods other than venv may not play nice with startup scripts. Using these methods indicates that you're an advanced user and know what you're doing.
conda create -n tabbyAPI python=3.11
conda activate tabbyAPI
conda install -c "nvidia/label/cuda-12.4.1" cuda
conda install -k git
git clone https://github.com/theroyallab/tabbyAPI
Note
If you are planning to use custom versions of dependencies such as dev ExllamaV2, make sure to build the Docker image yourself!
git clone https://github.com/theroyallab/tabbyAPI
cd tabbyAPI
.
docker/docker-compose.yml
filevolumes: # - /path/to/models:/app/models # Change me # - /path/to/config.yml:/app/config.yml # Change me # - /path/to/api_tokens.yml:/app/api_tokens.yml # Change me
docker/docker-compose.yml
:# Uncomment this to build a docker image from source #build: # context: .. # dockerfile: ./docker/Dockerfile # Comment this to build a docker image from source image: ghcr.io/theroyallab/tabbyapi:latest
docker compose -f docker/docker-compose.yml up
to build the dockerfile and start the server.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4