Pecca is starting as a Rust port of the excellent @karpathy llama2.c, itself a minimalistic adaptation of llama.cpp.
Compared to other Rust ports, Pecca leverages ndarray, which has several advantages:
Going forward, Pecca will leverage Rust and its ecosystem whenever it makes sense, rather than attempting to avoid dependencies above all (like llama.cpp).
git clone https://github.com/rahoua/pecca-rs.git cd pecca-rs wget -P ./models/stories/ https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin cargo run --release generate ./models/stories/stories15M.bin
Pecca can be run similarly with larger tiny stories models (like the 110M one) or the llama2 models (only 7B recommended so far). For a full list of command line options run:
To get the llama2 models, follow the instructions for llama2.c. Pecca supports the same model format. As Pecca does not use memmap, loading and quantizing the model on the fly can take some time. To speed things up, the models can also be saved quantized using the -f --write-model <path>
command line switch.
For codellama, the instructions are similar except for the tokenizer which is slightly different. To make the process easier, the updated tokenizer is provided. To override the default tokenizer, run pecca using the -k
command line option:
./target/release/pecca-rs generate /path/to/codellama-instr-7b.bin -k "./models/tokenizer-code.bin"
At the moment there's no formal benchmark, we just provide rough estimates to give a ballpark of overall performance.
Llama2 7B model on a Macbook Pro M2 Max:
A list of possible future developments for the project:
dot
operation to support cublas or Metal.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4