Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt

Use this file to discover all available pages before exploring further.

All models available through SERV mirror the Playground picker. Use any model ID below with the SERV endpoint and your existing OpenAI or Anthropic SDK. Every model supports reasoning effort. OpenAI-shape models use the reasoning_effort parameter (none | low | medium | high). Claude models use the Anthropic thinking parameter. Endpoint compatibility: most models route through /v1/chat/completions. Some models also route through /v1/responses or /v1/messages. For the full table see the compatibility matrix on the SDK Integration page. Prices are per million tokens. All values include SERV Reasoning.

Try it in the Playground

The Playground is the fastest way to evaluate any model in the catalog without writing code. Pick a model on each side, set the reasoning effort (Low / Medium / High), and run the same prompt through Raw mode and SERV Reasoning to see the difference instantly. → Open the Playground at console.openserv.ai/playground A few things to do there before you commit to an integration:
  • Validate the cost-down hypothesis. Pin gpt-5.4 (Raw) on one side and gpt-5.4-mini (SERV Reasoning) on the other. If outputs match on your real prompts, you’ve just found a cheaper production model.
  • Iterate on your system prompt. The System Prompt field at the top lets you test task-specific prompting before you bake it into your application.
  • Stress-test with sample or custom prompts. The provided samples give you a baseline; bring your own real workload after that.

OpenAI

ModelAPI IDInputOutputContext
GPT-5.5gpt-5.5$6.50$39.001M
GPT-5.4gpt-5.4$3.25$20.001M
GPT-5.4 Minigpt-5.4-mini$1.00$6.00400K
GPT-5.4 Nanogpt-5.4-nano$0.250$1.60400K
o3o3$2.50$10.00200K
o3 Minio3-mini$1.40$5.50200K
o3 Proo3-pro$26.00$104.00200K
o4 Minio4-mini$1.40$5.50200K

Anthropic

ModelAPI IDInputOutputContext
Claude Opus 4.6claude-opus-4.6$6.50$32.001M
Claude Sonnet 4.6claude-sonnet-4.6$4.00$20.001M
Claude Haiku 4.5claude-haiku-4.5$1.25$6.50200K

Google

ModelAPI IDInputOutputContext
Gemini Flash Latestgemini-flash-latest$0.650$4.001M
Gemini Pro Latestgemini-pro-latest$2.50$16.001M
Gemma 4 26B A4Bgemma-4-26b-a4b-it$0.080$0.430262K
Gemma 4 31Bgemma-4-31b-it$0.170$0.500262K
Gemma: the Playground displays these as “Gemma 4 31B” and “Gemma 4 26B A4B”, but the API requires the -it (instruction-tuned) suffix. Without it you get 404 The model 'gemma-...' does not exist. Gemini and google-genai SDK: SERV doesn’t support Gemini’s native generateContent wire format. Use the OpenAI SDK with /v1/chat/completions to call any Gemini or Gemma model. Gemini specifically returns 502 on /v1/messages, so stick to /v1/chat/completions for it.

xAI

ModelAPI IDInputOutputContext
Grok 4.3grok-4.3$1.60$3.251M
Grok 4.20grok-4.20$1.60$3.252M

Qwen

ModelAPI IDInputOutputContext
Qwen3.6 Flashqwen3.6-flash$0.320$2.001M
Qwen3.6 Max Previewqwen3.6-max-preview$1.40$8.00262K

DeepSeek

ModelAPI IDInputOutputContext
DeepSeek V4 Prodeepseek-v4-pro$0.550$1.101M
DeepSeek V4 Flashdeepseek-v4-flash$0.180$0.3501M

See also