LangChain provides an optional caching layer for LLMs. This is useful for two reasons:
It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.
%pip install -qU langchain_openai langchain_community
import os
from getpass import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass()
from langchain_core.globals import set_llm_cache
from langchain_openai import OpenAI
llm = OpenAI(model="gpt-3.5-turbo-instruct", n=2, best_of=2)
%%time
from langchain_core.caches import InMemoryCache
set_llm_cache(InMemoryCache())
llm.invoke("Tell me a joke")
CPU times: user 546 ms, sys: 379 ms, total: 925 ms
Wall time: 1.11 s
"\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
llm.invoke("Tell me a joke")
CPU times: user 192 µs, sys: 77 µs, total: 269 µs
Wall time: 270 µs
"\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
SQLite Cache
from langchain_community.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
%%time
llm.invoke("Tell me a joke")
CPU times: user 10.6 ms, sys: 4.21 ms, total: 14.8 ms
Wall time: 851 ms
"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
llm.invoke("Tell me a joke")
CPU times: user 59.7 ms, sys: 63.6 ms, total: 123 ms
Wall time: 134 ms
"\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4