RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/VJHack below:

VJHack (Vinesh Janarthanan) · GitHub

Pricing

Search code, repositories, users, issues, pull requests...

Saved searches Use saved searches to filter your results more quickly

Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

Vinesh Janarthanan VJHack

Fast, hermetic builds with Bazel · LLM inference & optimisation

I'm a software engineer passionate about builds and AI.
With Bazel, there's no reason not to have lightning-fast, cross-platform builds.
I believe everyone should be able to run language models on consumer hardware, and I'm deeply interested in inference and performance optimization.

#11223 – Top‑σ sampler | Paper – Implemented Top‑σ sampling algorithm from the paper Top-nσ: Not All Logits Are You Need, a novel alternative to Top‑k/Top‑p for LLM decoding, creating a stable sampling space even in high temeratures.
#11180 #11116 – Restructures the gguf PyPI package to avoid installing multiple top-level packages and prevent conflicts with existing scripts directory.
– Fixed memory alignment issues in quantized KV-cache allocations, improving stability for int4 models.
#9527 – Updates response_format to match OpenAI's new structured output schema
#9484 – Added the option to disable context shift on infinite text generation with command line argument (--no-context-shift)

#15 – Adds a local cache for FIM completions to reduce server calls. Uses a SHA-256 hash of the prompt state as the key. Default size is 250 (configurable), with a random eviction policy.
#18 – Optimizes FIM cache by retaining suggestions when the user continues typing the same text.
#21 – Updates the info message to show cache-specific metrics on cache hits (C: current/size | t: total time). Also reduces cache size by storing only the completion content.
#24 – Minimizes server-client payloads by filtering out unused response fields. Applies to both ring_update() and main FIM calls, keeping only essential fields like content and timings.

Pinned Loading

Forked from ggml-org/llama.vim

Vim plugin for LLM-assisted code/text completion

Vim Script

You can’t perform that action at this time.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4