Modern ML frameworks like HF transformers are easy to use but extremely abstract. There are times the abstraction makes sense, particularly model training. For inference using transformers, the "real" inference code is less complex then the abstraction, and it can be faster and more transparent to just write the code. That way you can plainly see what your model is doing under the hood and adapt it to your use case, rather than picking through layer after layer of aabstraction over what is effectively a for-loop with some matrix multiplications inside.
To that end, as a complement to the llama.f90 Fortran LLM, this project demonstrates a Sentence Transformer in "pure" Fortran with no dependencies (you still need python to convert pytorch models if you want to use them).
I plan to evolve this to make sure it can work with general transformer models, and add performance optimization as required. That said, I don't want to add any abstraction so I only want to add generalizations that don't obscure what is going on. The code can easily be adapted for architectural variations.
# clone the repo git clone https://github.com/rbitr/ferrite cd ferrite # download a model wget https://huggingface.co/SDFASDGA/llm/resolve/main/msmarco-distilbert-base-dot-prod-v3-f32.gguf # compile make #run ./tx -m msmarco-distilbert-base-dot-prod-v3-f32.gguf -v -p "I alwas feel like somebody's watching me" # (sic) GGUF Header Info Magic number: 1179993927 Version: 3 Tensor Count: 101 Key-Value Pairs: 15 general.architecture distilbert general.name DistilBert distilbert.context_length 512 distilbert.embedding_length 768 distilbert.feed_forward_length 3072 distilbert.block_count 6 distilbert.attention.head_count 12 distilbert.attention.head_count_kv 1 general.file_type 0 tokenizer.ggml.model gpt2 tokenizer.ggml.tokens 30522 tokenizer.ggml.token_type 30522 tokenizer.ggml.unknown_token_id 100 tokenizer.ggml.seperator_token_id 102 tokenizer.ggml.padding_token_id 0 Position 573471 Deficit 30 data offset 573473 Embedding dimension: 768 Hidden dimension: 3072 Layers: 6 Heads: 12 kv Heads: 1 Vocabulary Size: 30522 Sequence Length: 512 head size 64 kv head Size 64 loaded word embedding weights: 23440896 loaded position embedding weights: 393216 loaded embedding layernorm weights: 768 loaded embedding layernorm bias: 768 loaded wq weights: 3538944 loaded wq bias: 4608 loaded wk weights: 3538944 loaded wk bias: 4608 loaded wv weights: 3538944 loaded wv bias: 4608 loaded wo weights: 3538944 loaded wo bias: 4608 loaded sa layernorm weights: 4608 loaded sa layernorm bias: 4608 loaded w1 weights: 14155776 loaded w1 bias: 18432 loaded w2 (down) weights: 14155776 loaded w2 (down) bias: 4608 loaded output norm weights: 4608 loaded output norm bias: 4608 loaded classifier weights: 589824 loading tokens found 30522 tokens maximum token length 18 Token 4081 is andrew simple token: i wordpiece tokens: i simple token: alwas wordpiece tokens: al ##was simple token: feel wordpiece tokens: feel simple token: like wordpiece tokens: like simple token: somebody wordpiece tokens: somebody simple token: ' wordpiece tokens: ' simple token: s wordpiece tokens: s simple token: watching wordpiece tokens: watching simple token: me wordpiece tokens: me 102 1046 2633 17312 2515 2067 8308 1006 1056 3667 2034 103 0.117702775 0.268108070 -0.412374288 -0.684159577 -0.272519588 -0.633238137 ...
Right now I've only tested it with the msmarco-distilbert-base-dot-prod-v3
model from sbert.net. This is a DistilBbert transformer with a pooling and linear layer used for generating embeddings for semantic search. See https://www.sbert.net/docs/pretrained-models/msmarco-v3.html for more information.
Command line arguments are as follows:
case ('-m', '--model') ! path to model file -- case ('-p', '--prompt') ! prompt string -- case ('-s', '--tokenizer') ! path to custom tokenizer -- case ('-t', '--temperature') ! temperature scaling (not used) -- case ('-n', '--num_tokens') ! number of tokens to generate, including prompt (not used) -- case ('-v', '--verbose') ! print additional information -- case ('-1', '--single_line') ! print each element on single line -- case ('-q', '--quiet') ! don't print embedding
Models are in gguf format, see https://github.com/ggerganov/ggml/blob/master/docs/gguf.md
You can use the convert-hf-to-gguf.py
file from https://github.com/rbitr/llama.cpp to convery HF model files, ie
git clone https://github.com/rbitr/llama.cpp # get the model git clone https://huggingface.co/sentence-transformers/msmarco-distilbert-base-dot-prod-v3 # convert python ./llama.cpp/convert-hf-to-gguf.py msmarco-distilbert-base-dot-prod-v3 --outtype f32
Note that only distilbert models are supported and it has not been extensively tested. Support is currently limited to the fork referenced above, it's not part of the original repo.
Examples (currently using the old model file format, adjust accordingly)Included in the repo is a file sentence_ex
made up of some sentences from wikipedia about Europe and the saxoaphone. We save temporary embeddings for each sentence with a bash one-liner:
x=1; while read s; do echo $x $s; ./tx -m msmarco-distilbert-base-dot-prod-v3_converted_full.bin -1 -p "$s" > tmp/emb${x}.txt; x=$((x+1)); done < sentence_ex.txt 1 Europe is a continent located entirely in the Northern Hemisphere and mostly in the Eastern Hemisphere. 2 It comprises the westernmost part of Eurasia and is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east. 3 Europe is commonly considered to be separated from Asia by the watershed of the Ural Mountains, the Ural River, the Caspian Sea, the Greater Caucasus, the Black Sea, and the waterways of the Turkish Straits. 4 Although some of this border is over land, Europe is generally accorded the status of a full continent because of its great physical size and the weight of history and tradition. 5 Europe covers about 10,180,000 square kilometres (3,930,000 sq mi), or 2% of the Earth's surface (6.8% of land area), making it the second smallest continent. 6 Politically, Europe is divided into about fifty sovereign states, of which Russia is the largest and most populous, spanning 39% of the continent and comprising 15% of its population. 7 Europe had a total population of about 741 million (about 11% of the world population) as of 2018. 8 The European climate is largely affected by warm Atlantic currents that temper winters and summers on much of the continent, even at latitudes along which the climate in Asia and North America is severe. 9 Further from the sea, seasonal differences are more noticeable than close to the coast. 10 European culture is the root of Western civilization, which traces its lineage back to ancient Greece and ancient Rome. 11 The fall of the Western Roman Empire in 476 AD and the subsequent Migration Period marked the end of Europe's ancient history and the beginning of the Middle Ages. 12 A saxophone is a type of musical instrument in the woodwind family. 13 The saxophone uses a piece of wood, called a reed, to make sound. 14 The player blows air into the mouthpiece, which vibrates the reed. 15 The saxophone also uses keys to change pitch, and the player closes or opens holes to choose the note. 16 Commonly, saxophones have about 22 keys. 17 The saxophone is most commonly found in four voices: soprano, alto, tenor, and baritone saxophones. 18 However, uncommon saxophones include the bass and contrabass saxophones (lower than a baritone saxophone), the C-melody saxophone (between the tenor and alto saxophones), and the sopranino saxophone (higher than a soprano saxophone). 19 It was invented in 1840 by Adolphe Sax and is used in classical, jazz, and occasionally in rock, pop, and other styles. 20 The saxophone was originally created for military bands, but was commonly used in jazz big bands in the 1940s and 1950s. 21 Famous saxophone players include Marcel Mule (classical music), John Coltrane (jazz music), and Charlie Parker (jazz music).
Then we can lookup queries by making an embedding and finding the entry with the largest dot-product (computed here in awk)
./tx -m msmarco-distilbert-base-dot-prod-v3_converted_full.bin -1 -p "What bodies of water are in europe?" > tmp/embq.txt for x in {1..21}; do echo $x; paste tmp/emb${x}.txt tmp/embq.txt | awk '{dp+=$1*$2} END {print dp}'; done 1 35.505 2 33.9245 3 37.5551 4 29.1835 5 36.0957 6 31.6795 7 29.0034 8 31.7701 9 17.0193 10 26.859 11 20.4201 12 10.0551 13 9.95383 14 14.5428 15 8.84668 16 10.5251 17 10.2478 18 8.90325 19 12.1863 20 6.15891 21 7.62652
The question was about Europe so the scores are higher on the first 11 entries, and the maximum is #3 which talks about waterways.
Below we ask who invented the saxophone and get the highest score at sentence 19 which contains the answer. (Note I misspelled saxophone in the query and it still works).
./tx -m msmarco-distilbert-base-dot-prod-v3_converted_full.bin -1 -p "Who invented the saxaphone?" > tmp/embq.txt for x in {1..21}; do echo $x; paste tmp/emb${x}.txt tmp/embq.txt | awk '{dp+=$1*$2} END {print dp}'; done 1 8.78761 2 11.0401 3 11.4972 4 5.93544 5 3.17357 6 6.38081 7 9.9643 8 16.3048 9 12.8389 10 22.387 11 22.8647 12 31.1579 13 32.949 14 24.6756 15 28.0059 16 24.3043 17 25.446 18 29.0274 19 42.3414 20 30.9246 21 33.6924
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4