RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://osmosis.ai/blog/structured-outputs-comparison below:

Website Navigation

Home
Blog

Applying RL: Fixing Structured Outputs

Published on May 29, 2025

A significant portion of AI use cases revolve around structured outputs - i.e. using the model to ingest unstructured textual data to generate a structured output, typically in JSON format. However, this leads to a performance decrease in tasks that are not strictly just formatting changes since structured output mode enforces a schema and stops the model from thinking ‘freely’.

So instead of structured output mode, we used reinforcement learning to train an ultra small model (Qwen3-0.6B) to do it instead! All you have to do is feed in the unstructured data with the desired output schema.

Download Osmosis-Structure-0.6B here: Ollama | Hugging Face

We tested the most recent Anthropic and OpenAI models on math questions (1983-2024 AIME and DAPO-Math-17k-Processed), comparing accuracy between structured output mode and unstructured responses (with Osmosis to output the same structure after):

(For the Anthropic models, we used Assistant prefill as a proxy for structured output mode)

Using Osmosis-Structure-0.6B to structure outputs significantly improved performance from Sonnet 4, Opus 4, and GPT-4.1. Interestingly, o3 performed well even with structured output mode. We speculate that this may be due to a double pass - i.e. o3 generates an output, and then 4o-mini (or another small model) is used to validate / structure the output, similar to Osmosis-JSON-0.6B. We came to this hypothesis since GPT-4.1's structured output time to completion is significantly faster compared to its unstructured completions (>5% time required). In comparison, o3's time to completion for structured and unstructured calls was similar - sometimes even longer for structured outputs.

In production environments, we've also observed users opting to feed unstructured outputs from more expensive models into cheaper models (e.g. GPT-4o-mini) to structure the response. Osmosis-Structure-0.6B acts as an open source, smaller replacement of the second model.

We trained Osmosis-Structure-0.6B using GRPO on a synthetic dataset of 500K examples of inputs/outputs where the prompt relies on a structured output (e.g. reasoning traces of math solutions with the response as structured outputs, data extraction & multi-nested JSON formatting from complex unstructured output, etc.) - the model was rewarded based on the amount of correct value fields that was recalled from the input text.

If you're interested in learning more about reinforcement learning, reach out!

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4