A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/jrudolph/llama2.scala below:

jrudolph/llama2.scala: Inference Llama 2 in Scala with AVX2 kernels in C (A port of llama2.c from Andrej Karpathy)

A Scala 2 port of Andrej Karpathy's llama2.c

This is a Scala port of Andrej Karpathy's llama2.c, a bare bones implementation to run inference of models with a Llama-like transformer-based LLM architecture.

The code expects tokenizer.bin and stories15M.bin in the current directory.

This started as a port of the original code in pure Scala. Later, more high-level abstractions were added and low-level C kernels with AVX2 intrinsics to speed up matrix multiplication.

Current numbers run with version 08c65d04 on my AMD Ryzen 7 4800H laptop with GraalVM JDK 17.

Implementations:

Notes:

Model Quantization Implementation Threads tok / s stories15M.bin Q4 native-avx2 1 494 stories15M.bin Q4 native-avx2 6 931 stories15M.bin Q4 Scala 1 65 stories15M.bin Q8 native-avx2 1 533 stories15M.bin Q8 native-avx2 6 800 stories15M.bin Q8 Scala 1 57 stories15M.bin none native-avx2 1 374 stories15M.bin none native-avx2 6 677 stories15M.bin none Scala 1 66 stories15M.bin none scala-native vanilla 1 14 stories15M.bin none scala-native (native mmaps) 1 50 stories42M.bin Q4 native-avx2 1 223 stories42M.bin Q4 native-avx2 6 497 stories42M.bin Q4 Scala 1 24 stories42M.bin Q8 native-avx2 1 229 stories42M.bin Q8 native-avx2 6 407 stories42M.bin Q8 Scala 1 22 stories42M.bin none native-avx2 1 137 stories42M.bin none native-avx2 6 243 stories42M.bin none Scala 1 24 stories42M.bin none llama2.c / run 1 21 stories42M.bin none llama2.c / runfast 1 69 stories42M.bin none llama2.c / runomp 1 98 stories42M.bin none llama2.c / runomp 6 195 stories110M.bin Q4 native-avx2 1 95 stories110M.bin Q4 native-avx2 6 239 stories110M.bin Q4 Scala 1 9.6 stories110M.bin Q8 native-avx2 1 99 stories110M.bin Q8 native-avx2 6 183 stories110M.bin Q8 Scala 1 8.4 stories110M.bin none native-avx2 1 50 stories110M.bin none native-avx2 6 85 stories110M.bin none Scala 1 8.9 stories110M.bin none llama2.c / runomp 6 77 llama2_7b.bin Q4 native-avx2 1 2.0 llama2_7b.bin Q4 native-avx2 6 6.5 llama2_7b.bin Q4 Scala 1 0.16 llama2_7b.bin Q8 native-avx2 1 1.9 llama2_7b.bin Q8 native-avx2 6 4.46 llama2_7b.bin Q8 Scala 1 0.14 llama-2-7b.ggmlv3.q4_0.bin as provided native-avx2 1 1.66 llama-2-7b.ggmlv3.q4_0.bin as provided native-avx2 6 6.71 llama-2-7b.ggmlv3.q4_0.bin as provided Scala 1 0.13 llama-2-7b.ggmlv3.q4_0.bin as provided llama.cpp 1 2.0 llama-2-7b.ggmlv3.q4_0.bin as provided llama.cpp 6 8.1

MIT


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4