Use the GitHub Models service from the CLI!
This repository implements the GitHub Models CLI extension (gh models
), enabling users to interact with AI models via the gh
CLI. The extension supports inference, prompt evaluation, model listing, and test generation.
The extension requires the gh
CLI to be installed and in the PATH
. The extension also requires the user have authenticated via gh auth
.
After installing the gh
CLI, from a command-line run:
gh extension install https://github.com/github/gh-models
If you've previously installed the gh models
extension and want to update to the latest version, you can run this command:
gh extension upgrade github/gh-models
Example output:
ID DISPLAY NAME ai21-labs/ai21-jamba-1.5-large AI21 Jamba 1.5 Large openai/gpt-4.1 OpenAI GPT-4.1 openai/gpt-4o-mini OpenAI GPT-4o mini cohere/cohere-command-r Cohere Command R deepseek/deepseek-v3-0324 Deepseek-V3-0324
Use the value in the "ID" column when specifying the model on the command-line.
Run the extension in REPL mode. This will prompt you for which model to use.
In REPL mode, use /help
to list available commands. Otherwise just type your prompt and hit ENTER to send to the model.
Run the extension in single-shot mode. This will print the model output and exit.
gh models run openai/gpt-4o-mini "why is the sky blue?"
Run the extension with output from a command. This uses single-shot mode.
cat README.md | gh models run openai/gpt-4o-mini "summarize this text"
Run evaluation tests against a model using a .prompt.yml
file:
gh models eval my_prompt.prompt.yml
The evaluation will run test cases defined in the prompt file and display results in a human-readable format. For programmatic use, you can output results in JSON format:
gh models eval my_prompt.prompt.yml --json
The JSON output includes detailed test results, evaluation scores, and summary statistics that can be processed by other tools or CI/CD pipelines.
Here's a sample GitHub Action that uses the eval
command to automatically run the evals in any PR that updates a prompt file: evals_action.yml.
Learn more about .prompt.yml
files here: Storing prompts in GitHub repositories.
Generate comprehensive test cases for your prompts using the PromptPex methodology:
gh models generate my_prompt.prompt.yml
The generate
command analyzes your prompt file and automatically creates test cases to evaluate the prompt's behavior across different scenarios and edge cases. This helps ensure your prompts are robust and perform as expected.
The generate
command is based on PromptPex, a Microsoft Research framework for systematic prompt testing. PromptPex follows a structured approach to generate comprehensive test cases by:
graph TD PUT(["Prompt Under Test (PUT)"]) I["Intent (I)"] IS["Input Specification (IS)"] OR["Output Rules (OR)"] IOR["Inverse Output Rules (IOR)"] PPT["PromptPex Tests (PPT)"] PUT --> IS PUT --> I PUT --> OR OR --> IOR I ==> PPT IS ==> PPT OR ==> PPT PUT ==> PPT IOR ==> PPTLoading
You can customize the test generation process with various options:
# Specify effort level (min, low, medium, high) gh models generate --effort high my_prompt.prompt.yml # Use a specific model for groundtruth generation gh models generate --groundtruth-model "openai/gpt-4.1" my_prompt.prompt.yml # Disable groundtruth generation gh models generate --groundtruth-model "none" my_prompt.prompt.yml # Load from an existing session file (or create a new one if needed) gh models generate --session-file my_prompt.session.json my_prompt.prompt.yml # Custom instructions for specific generation phases gh models generate --instruction-intent "Focus on edge cases" my_prompt.prompt.yml
The effort
flag controls a few flags in the test generation engine and is a tradeoff between how much tests you want generated and how much tokens/time you are willing to spend.
min
is just enough to generate a few tests and make sure things are probably configured.low
should be used to do a quick try of the test generation. It limits the number of rules to 3
.medium
provides much better coveragehigh
spends more token per rule to generate tests, which typically leads to longer, more complex inputsThe command supports custom instructions for different phases of test generation:
--instruction-intent
: Custom system instruction for intent generation--instruction-inputspec
: Custom system instruction for input specification generation--instruction-outputrules
: Custom system instruction for output rules generation--instruction-inverseoutputrules
: Custom system instruction for inverse output rules generation--instruction-tests
: Custom system instruction for tests generationRemember when interacting with a model you are experimenting with AI, so content mistakes are possible. The feature is subject to various limits (including requests per minute, requests per day, tokens per request, and concurrent requests) and is not designed for production use cases. GitHub Models uses Azure AI Content Safety. These filters cannot be turned off as part of the GitHub Models experience. If you decide to employ models through a paid service, please configure your content filters to meet your requirements. This service is under GitHub's Pre-release Terms. Your use of the GitHub Models is subject to the following Product Terms and Privacy Statement. Content within this Repository may be subject to additional license terms.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4