Get reliable JSON from any LLM. Built on Pydantic for validation, type safety, and IDE support.
import instructor from pydantic import BaseModel # Define what you want class User(BaseModel): name: str age: int # Extract it from natural language client = instructor.from_provider("openai/gpt-4o-mini") user = client.chat.completions.create( response_model=User, messages=[{"role": "user", "content": "John is 25 years old"}], ) print(user) # User(name='John', age=25)
That's it. No JSON parsing, no error handling, no retries. Just define a model and get structured data.
Getting structured data from LLMs is hard. You need to:
Instructor handles all of this with one simple interface:
Without Instructor With Instructorresponse = openai.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "..."}], tools=[ { "type": "function", "function": { "name": "extract_user", "parameters": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, }, }, }, } ], ) # Parse response tool_call = response.choices[0].message.tool_calls[0] user_data = json.loads(tool_call.function.arguments) # Validate manually if "name" not in user_data: # Handle error... pass
client = instructor.from_provider("openai/gpt-4") user = client.chat.completions.create( response_model=User, messages=[{"role": "user", "content": "..."}], ) # That's it! user is validated and typed
Or with your package manager:
uv add instructor poetry add instructorWorks with every major provider
Use the same code with any LLM provider:
# OpenAI client = instructor.from_provider("openai/gpt-4o") # Anthropic client = instructor.from_provider("anthropic/claude-3-5-sonnet") # Google client = instructor.from_provider("google/gemini-pro") # Ollama (local) client = instructor.from_provider("ollama/llama3.2") # With API keys directly (no environment variables needed) client = instructor.from_provider("openai/gpt-4o", api_key="sk-...") client = instructor.from_provider("anthropic/claude-3-5-sonnet", api_key="sk-ant-...") client = instructor.from_provider("groq/llama-3.1-8b-instant", api_key="gsk_...") # All use the same API! user = client.chat.completions.create( response_model=User, messages=[{"role": "user", "content": "..."}], )Production-ready features
Failed validations are automatically retried with the error message:
from pydantic import BaseModel, field_validator class User(BaseModel): name: str age: int @field_validator('age') def validate_age(cls, v): if v < 0: raise ValueError('Age must be positive') return v # Instructor automatically retries when validation fails user = client.chat.completions.create( response_model=User, messages=[{"role": "user", "content": "..."}], max_retries=3, )
Stream partial objects as they're generated:
from instructor import Partial for partial_user in client.chat.completions.create( response_model=Partial[User], messages=[{"role": "user", "content": "..."}], stream=True, ): print(partial_user) # User(name=None, age=None) # User(name="John", age=None) # User(name="John", age=25)
Extract complex, nested data structures:
from typing import List class Address(BaseModel): street: str city: str country: str class User(BaseModel): name: str age: int addresses: List[Address] # Instructor handles nested objects automatically user = client.chat.completions.create( response_model=User, messages=[{"role": "user", "content": "..."}], )
Trusted by over 100,000 developers and companies building AI applications:
Companies using Instructor include teams at OpenAI, Google, Microsoft, AWS, and many YC startups.
Extract structured data from any text:
from pydantic import BaseModel import instructor client = instructor.from_provider("openai/gpt-4o-mini") class Product(BaseModel): name: str price: float in_stock: bool product = client.chat.completions.create( response_model=Product, messages=[{"role": "user", "content": "iPhone 15 Pro, $999, available now"}], ) print(product) # Product(name='iPhone 15 Pro', price=999.0, in_stock=True)
Instructor's simple API is available in many languages:
vs Raw JSON mode: Instructor provides automatic validation, retries, streaming, and nested object support. No manual schema writing.
vs LangChain/LlamaIndex: Instructor is focused on one thing - structured extraction. It's lighter, faster, and easier to debug.
vs Custom solutions: Battle-tested by thousands of developers. Handles edge cases you haven't thought of yet.
We welcome contributions! Check out our good first issues to get started.
MIT License - see LICENSE for details.
Built by the Instructor community. Special thanks to Jason Liu and all contributors.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4