Showing content from https://github.com/VJHack below:
VJHack (Vinesh Janarthanan) · GitHub
Skip to content Navigation Menu
Search code, repositories, users, issues, pull requests...
Saved searches Use saved searches to filter your results more quickly
Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Vinesh Janarthanan VJHack
Fast, hermetic builds with Bazel · LLM inference & optimisation
I'm a software engineer passionate about builds and AI.
With Bazel, there's no reason not to have lightning-fast, cross-platform builds.
I believe everyone should be able to run language models on consumer hardware, and I'm deeply interested in inference and performance optimization.
- #11223 – Top‑σ sampler | Paper – Implemented Top‑σ sampling algorithm from the paper Top-nσ: Not All Logits Are You Need, a novel alternative to Top‑k/Top‑p for LLM decoding, creating a stable sampling space even in high temeratures.
- #11180 #11116 – Restructures the gguf PyPI package to avoid installing multiple top-level packages and prevent conflicts with existing scripts directory.
- – Fixed memory alignment issues in quantized KV-cache allocations, improving stability for int4 models.
- #9527 – Updates
response_format
to match OpenAI's new structured output schema
- #9484 – Added the option to disable context shift on infinite text generation with command line argument (
--no-context-shift
)
- #15 – Adds a local cache for FIM completions to reduce server calls. Uses a SHA-256 hash of the prompt state as the key. Default size is 250 (configurable), with a random eviction policy.
- #18 – Optimizes FIM cache by retaining suggestions when the user continues typing the same text.
- #21 – Updates the info message to show cache-specific metrics on cache hits (
C: current/size | t: total time
). Also reduces cache size by storing only the completion content.
- #24 – Minimizes server-client payloads by filtering out unused response fields. Applies to both
ring_update()
and main FIM calls, keeping only essential fields like content and timings.
Pinned Loading
-
Forked from ggml-org/llama.vim
Vim plugin for LLM-assisted code/text completion
Vim Script
You can’t perform that action at this time.
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4