Thank you so much for writing this. We are now working on compiling it with LFortran, this is a great example.
Regarding performance on my Apple M1 Max with GFortran 11.3.0, I get about 240 tokens/s with the default gfortran options. With -O3 -march=native -ffast-math -funroll-loops
I get about 277 tokens/s. Finally, with gfortran -O3 -march=native -ffast-math -funroll-loops -fexternal-blas llama2.f90 -o llm -framework Accelerate
which should be the fastest, I still only get about 270 tokens/s. I think this is too small of a model, one would have to try a larger version to take advantage of the accelerated linear algebra.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4