A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/avble/llama.vim below:

avble/llama.vim: Vim plugin for LLM-assisted code/text completion

Local LLM-assisted text completion.

You can customize llama.vim by setting the g:llama_config variable.

Examples:

  1. Disable the inline info:

    " put before llama.vim loads
    let g:llama_config = { 'show_info': 0 }
  2. Same thing but setting directly

    let g:llama_config.show_info = v:false
  3. Disable auto FIM (Fill-In-the-Middle) completion with lazy.nvim

    {
        'ggml-org/llama.vim',
        init = function()
            vim.g.llama_config = {
                auto_fim = false,
            }
        end,
    }
  4. Changing accept line keymap

    let g:llama_config.keymap_accept_full = "<C-S>"

Please refer to :help llama_config or the source for the full list of options.

The plugin requires a llama.cpp server instance to be running at g:llama_config.endpoint.

Either build from source or use the latest binaries: https://github.com/ggml-org/llama.cpp/releases

Here are recommended settings, depending on the amount of VRAM that you have:

Use :help llama for more details.

The plugin requires FIM-compatible models: HF collection

Using llama.vim on M1 Pro (2021) with Qwen2.5-Coder 1.5B Q8_0:

The orange text is the generated suggestion. The green text contains performance stats for the FIM request: the currently used context is 15186 tokens and the maximum is 32768. There are 30 chunks in the ring buffer with extra context (out of 64). So far, 1 chunk has been evicted in the current session and there are 0 chunks in queue. The newly computed prompt tokens for this request were 260 and the generated tokens were 24. It took 1245 ms to generate this suggestion after entering the letter c on the current line.

Using llama.vim on M2 Ultra with Qwen2.5-Coder 7B Q8_0: llama.vim-0-lq.mp4

Demonstrates that the global context is accumulated and maintained across different files and showcases the overall latency when working in a large codebase.

Another example on a small Swift code

The plugin aims to be very simple and lightweight and at the same time to provide high-quality and performant local FIM completions, even on consumer-grade hardware. Read more on how this is achieved in the following links:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4