Documentation Index
Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt
Use this file to discover all available pages before exploring further.
All models available through SERV mirror the Playground picker. Use any model ID below with the SERV endpoint and your existing OpenAI or Anthropic SDK.
Every model supports reasoning effort. OpenAI-shape models use the reasoning_effort parameter (none | low | medium | high). Claude models use the Anthropic thinking parameter.
Endpoint compatibility: most models route through /v1/chat/completions. Some models also route through /v1/responses or /v1/messages. For the full table see the compatibility matrix on the SDK Integration page.
Prices are per million tokens. All values include SERV Reasoning.
Try it in the Playground
The Playground is the fastest way to evaluate any model in the catalog without writing code. Pick a model on each side, set the reasoning effort (Low / Medium / High), and run the same prompt through Raw mode and SERV Reasoning to see the difference instantly.
→ Open the Playground at console.openserv.ai/playground
A few things to do there before you commit to an integration:
- Validate the cost-down hypothesis. Pin
gpt-5.4 (Raw) on one side and gpt-5.4-mini (SERV Reasoning) on the other. If outputs match on your real prompts, you’ve just found a cheaper production model.
- Iterate on your system prompt. The System Prompt field at the top lets you test task-specific prompting before you bake it into your application.
- Stress-test with sample or custom prompts. The provided samples give you a baseline; bring your own real workload after that.
OpenAI
| Model | API ID | Input | Output | Context |
|---|
| GPT-5.5 | gpt-5.5 | $6.50 | $39.00 | 1M |
| GPT-5.4 | gpt-5.4 | $3.25 | $20.00 | 1M |
| GPT-5.4 Mini | gpt-5.4-mini | $1.00 | $6.00 | 400K |
| GPT-5.4 Nano | gpt-5.4-nano | $0.250 | $1.60 | 400K |
| o3 | o3 | $2.50 | $10.00 | 200K |
| o3 Mini | o3-mini | $1.40 | $5.50 | 200K |
| o3 Pro | o3-pro | $26.00 | $104.00 | 200K |
| o4 Mini | o4-mini | $1.40 | $5.50 | 200K |
Anthropic
| Model | API ID | Input | Output | Context |
|---|
| Claude Opus 4.6 | claude-opus-4.6 | $6.50 | $32.00 | 1M |
| Claude Sonnet 4.6 | claude-sonnet-4.6 | $4.00 | $20.00 | 1M |
| Claude Haiku 4.5 | claude-haiku-4.5 | $1.25 | $6.50 | 200K |
Google
| Model | API ID | Input | Output | Context |
|---|
| Gemini Flash Latest | gemini-flash-latest | $0.650 | $4.00 | 1M |
| Gemini Pro Latest | gemini-pro-latest | $2.50 | $16.00 | 1M |
| Gemma 4 26B A4B | gemma-4-26b-a4b-it | $0.080 | $0.430 | 262K |
| Gemma 4 31B | gemma-4-31b-it | $0.170 | $0.500 | 262K |
Gemma: the Playground displays these as “Gemma 4 31B” and “Gemma 4 26B A4B”, but the API requires the -it (instruction-tuned) suffix. Without it you get 404 The model 'gemma-...' does not exist.
Gemini and google-genai SDK: SERV doesn’t support Gemini’s native generateContent wire format. Use the OpenAI SDK with /v1/chat/completions to call any Gemini or Gemma model. Gemini specifically returns 502 on /v1/messages, so stick to /v1/chat/completions for it.
xAI
| Model | API ID | Input | Output | Context |
|---|
| Grok 4.3 | grok-4.3 | $1.60 | $3.25 | 1M |
| Grok 4.20 | grok-4.20 | $1.60 | $3.25 | 2M |
Qwen
| Model | API ID | Input | Output | Context |
|---|
| Qwen3.6 Flash | qwen3.6-flash | $0.320 | $2.00 | 1M |
| Qwen3.6 Max Preview | qwen3.6-max-preview | $1.40 | $8.00 | 262K |
DeepSeek
| Model | API ID | Input | Output | Context |
|---|
| DeepSeek V4 Pro | deepseek-v4-pro | $0.550 | $1.10 | 1M |
| DeepSeek V4 Flash | deepseek-v4-flash | $0.180 | $0.350 | 1M |
See also