RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://python.langchain.com/docs/how_to/llm_caching/ below:

How to cache LLM responses

LangChain provides an optional caching layer for LLMs. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

%pip install -qU langchain_openai langchain_community

import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass()

from langchain_core.globals import set_llm_cache
from langchain_openai import OpenAI



llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)

%%time
from langchain_core.caches import InMemoryCache

set_llm_cache(InMemoryCache())


llm.invoke("Tell me a joke")

CPU times: user 546 ms, sys: 379 ms, total: 925 ms
Wall time: 1.11 s

"\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

%%time

llm.invoke("Tell me a joke")

CPU times: user 192 µs, sys: 77 µs, total: 269 µs
Wall time: 270 µs

"\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

SQLite Cache


from langchain_community.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

%%time

llm.invoke("Tell me a joke")

CPU times: user 10.6 ms, sys: 4.21 ms, total: 14.8 ms
Wall time: 851 ms

"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

%%time

llm.invoke("Tell me a joke")

CPU times: user 59.7 ms, sys: 63.6 ms, total: 123 ms
Wall time: 134 ms

"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4