Get started with RunPod LLMs.
OverviewThis guide covers how to use the LangChain RunPod
LLM class to interact with text generation models hosted on RunPod Serverless.
pip install -qU langchain-runpod
RUNPOD_API_KEY
and RUNPOD_ENDPOINT_ID
are set.import getpass
import os
if "RUNPOD_API_KEY" not in os.environ:
os.environ["RUNPOD_API_KEY"] = getpass.getpass("Enter your RunPod API Key: ")
if "RUNPOD_ENDPOINT_ID" not in os.environ:
os.environ["RUNPOD_ENDPOINT_ID"] = input("Enter your RunPod Endpoint ID: ")
Instantiation
Initialize the RunPod
class. You can pass model-specific parameters via model_kwargs
and configure polling behavior.
from langchain_runpod import RunPod
llm = RunPod(
model_kwargs={
"max_new_tokens": 256,
"temperature": 0.6,
"top_k": 50,
},
)
Invocation
Use the standard LangChain .invoke()
and .ainvoke()
methods to call the model. Streaming is also supported via .stream()
and .astream()
(simulated by polling the RunPod /stream
endpoint).
prompt = "Write a tagline for an ice cream shop on the moon."
try:
response = llm.invoke(prompt)
print("--- Sync Invoke Response ---")
print(response)
except Exception as e:
print(
f"Error invoking LLM: {e}. Ensure endpoint ID/API key are correct and endpoint is active/compatible."
)
print("\n--- Sync Stream Response ---")
try:
for chunk in llm.stream(prompt):
print(chunk, end="", flush=True)
print()
except Exception as e:
print(
f"\nError streaming LLM: {e}. Ensure endpoint handler supports streaming output format."
)
Async Usage
try:
async_response = await llm.ainvoke(prompt)
print("--- Async Invoke Response ---")
print(async_response)
except Exception as e:
print(f"Error invoking LLM asynchronously: {e}.")
print("\n--- Async Stream Response ---")
try:
async for chunk in llm.astream(prompt):
print(chunk, end="", flush=True)
print()
except Exception as e:
print(
f"\nError streaming LLM asynchronously: {e}. Ensure endpoint handler supports streaming output format."
)
Chaining
The LLM integrates seamlessly with LangChain Expression Language (LCEL) chains.
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
parser = StrOutputParser()
chain = prompt_template | llm | parser
try:
chain_response = chain.invoke({"topic": "bears"})
print("--- Chain Response ---")
print(chain_response)
except Exception as e:
print(f"Error running chain: {e}")
try:
async_chain_response = await chain.ainvoke({"topic": "robots"})
print("--- Async Chain Response ---")
print(async_chain_response)
except Exception as e:
print(f"Error running async chain: {e}")
Endpoint Considerations
{"input": {"prompt": "...", ...}}
."output"
key of the final status response (e.g., {"output": "Generated text..."}
or {"output": {"text": "..."}}
)./stream
endpoint, the handler must populate the "stream"
key in the status response with a list of chunk dictionaries, like [{"output": "token1"}, {"output": "token2"}]
.For detailed documentation of the RunPod
LLM class, parameters, and methods, refer to the source code or the generated API reference (if available).
Link to source code: https://github.com/runpod/langchain-runpod/blob/main/langchain_runpod/llms.py
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4