Documentation Index
Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt
Use this file to discover all available pages before exploring further.
Integrate your existing OpenAI or Anthropic SDKs with SERV in minutes. You only need to update two fields: the Base URL and the API Key. The rest of your application (including prompts, tool definitions, and business logic) remains completely untouched.
SERV exposes three wire-compatible HTTP endpoints under a single base URL:
| Endpoint | Shape | Use it for |
|---|
POST /v1/chat/completions | OpenAI | Universal. Works with every model in the catalog. |
POST /v1/responses | OpenAI | OpenAI-family models when you want streamed reasoning summaries |
POST /v1/messages | Anthropic | Claude family and several others. See the compatibility matrix below. |
Base URL: https://inference-api.openserv.ai
Quickstart
OpenAI SDK → SERV (/v1/chat/completions)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://inference-api.openserv.ai/v1", // /v1 IS required here
apiKey: process.env.SERV_API_KEY,
});
const completion = await client.chat.completions.create({
model: "gpt-5.4-mini",
messages: [
// System prompt is required, see Gotcha #3 below
{ role: "system", content: "You are a concise assistant." },
{ role: "user", content: "What is a CPU register?" },
],
});
console.log(completion.choices[0].message.content);
OpenAI SDK → SERV (/v1/responses)
Use the Responses API when you want the reasoning trace alongside the answer. OpenAI-family models only.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://inference-api.openserv.ai/v1",
apiKey: process.env.SERV_API_KEY,
});
const response = await client.responses.create({
model: "gpt-5.4",
instructions: "You are a careful reasoner.", // top-level, counts as the system prompt
input: "What is the integral of x^2 from 0 to 3?",
});
console.log(response.output_text);
Anthropic SDK → SERV (/v1/messages)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "https://inference-api.openserv.ai", // NO /v1, the SDK adds it
authToken: process.env.SERV_API_KEY, // authToken, not apiKey
});
const message = await client.messages.create({
model: "claude-haiku-4.5",
max_tokens: 1024, // required by the Anthropic API
system: "You answer in one sentence.", // top-level, required by SERV
messages: [{ role: "user", content: "What is a CPU register?" }],
});
console.log(message.content.find(b => b.type === "text").text);
Gotchas
These four behaviors are what trip up most integrations.
Gotcha #1: baseURL suffix differs by SDK
| SDK | baseURL value | Why |
|---|
| OpenAI SDK | https://inference-api.openserv.ai/v1 | OpenAI SDK expects the /v1 to live in the base URL |
| Anthropic SDK | https://inference-api.openserv.ai | Anthropic SDK’s resource methods (client.messages.create) internally hit /v1/messages. Including /v1 yourself would call /v1/v1/messages and 404 |
Gotcha #2: auth field name differs
| SDK | Field | Notes |
|---|
| OpenAI SDK | apiKey | Standard. |
| Anthropic SDK | authToken | apiKey also works, but authToken is preferred for third-party gateways. It keeps ANTHROPIC_API_KEY free if you ever want to fall back to direct Anthropic. |
Gotcha #3: SERV requires a system prompt
Every request needs a system, developer, or instructions message. Sending without one returns:
400 A system prompt is required. Please include a system or developer message in your request.
This is a SERV-specific constraint, not part of either upstream API. Where the system prompt goes by endpoint:
| Endpoint | Where the system prompt goes |
|---|
/v1/chat/completions | messages: [{ role: "system", content: "..." }, ...] |
/v1/responses | top-level instructions: "..." |
/v1/messages | top-level system: "..." |
Gotcha #4: reasoning_effort allowed values are SERV-specific
The Playground’s Low / Medium / High selector maps to the reasoning_effort request parameter.
| Side | Allowed values |
|---|
| Upstream OpenAI | minimal | low | medium | high |
| SERV | none | low | medium | high |
Passing 'minimal' to SERV returns 400. If you’re porting an OpenAI integration that hard-codes 'minimal', change it to 'none' or 'low'.
For Claude models, the equivalent control is the Anthropic thinking parameter: { type: "enabled", budget_tokens: <int ≥ 1024> }.
Parameter map: OpenAI ↔ Anthropic ↔ SERV
If you’re moving an integration across SDKs, this is what you need.
| Concept | OpenAI Chat | OpenAI Responses | Anthropic Messages |
|---|
| HTTP path | /v1/chat/completions | /v1/responses | /v1/messages |
| Auth field (SDK constructor) | apiKey | apiKey | authToken |
baseURL suffix to use with SERV | /v1 | /v1 | (none) |
| Token cap field | max_completion_tokens | max_output_tokens | max_tokens (required) |
| System prompt | message with role:"system" | top-level instructions | top-level system |
| User message shape | {role, content} in messages[] | top-level input (string or array) | {role, content} in messages[] |
| Reasoning-effort control | reasoning_effort: low | medium | high | reasoning: { effort, summary } | thinking: { type:"enabled", budget_tokens } |
| Streaming | stream: true | stream: true | stream: true |
| Stop sequences | stop | n/a | stop_sequences |
| Tool schema | tools: [{type:"function", function:{name, parameters}}] | tools: [{type:"function", ...}] | tools: [{name, input_schema}] (no nested function:) |
| Tool-choice | tool_choice: "auto" | {type:"function", function:{name}} | tool_choice: ... | tool_choice: "auto" | "any" | {type:"tool", name} |
| Response text | choices[0].message.content | output_text (convenience) or output[] blocks | content[] array, find block with type === "text" |
| Token usage fields | usage.prompt_tokens / completion_tokens / total_tokens | usage.input_tokens / output_tokens / total_tokens | usage.input_tokens / output_tokens |
| Reasoning tokens | usage.completion_tokens_details.reasoning_tokens | reasoning items inside output[] | thinking blocks inside content[] (when thinking enabled) |
| Cache metrics | usage.prompt_tokens_details.cached_tokens | usage.input_tokens_details.cached_tokens | usage.cache_read_input_tokens, usage.cache_creation_input_tokens |
Available models
The live catalog mirrors the Playground picker.
| Provider | Model IDs |
|---|
| OpenAI | gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o3-mini, o3-pro, o4-mini |
| Anthropic | claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5 |
| Google | gemini-flash-latest, gemini-pro-latest, gemma-4-26b-a4b-it, gemma-4-31b-it |
| xAI | grok-4.3, grok-4.20 |
| Qwen | qwen3.6-flash, qwen3.6-max-preview |
| DeepSeek | deepseek-v4-pro, deepseek-v4-flash |
Gemma: the Playground displays these as “Gemma 4 31B” and “Gemma 4 26B A4B”, but the API requires the -it (instruction-tuned) suffix. Use gemma-4-31b-it and gemma-4-26b-a4b-it. Without the suffix you get 404 The model 'gemma-...' does not exist.
For pricing and context windows, see Models.
Model × endpoint compatibility matrix
Not every model works on every endpoint. The rule is not “Claude → /messages, everyone else → /chat/completions.” The real behavior:
| Provider | /v1/chat/completions | /v1/responses | /v1/messages |
|---|
| OpenAI | ✅ | ✅ | ✅ |
| Anthropic | ✅ | ❌ | ✅ |
| Google (Gemini) | ✅ | ❌ | ⚠️ (502 upstream) |
| Google (Gemma) | ✅ | ❌ | ✅ |
| xAI | ✅ | ❌ | ✅ |
| Qwen | ✅ | ❌ | ✅ |
| DeepSeek | ✅ | ❌ | ✅ |
Three rules of thumb fall out of this:
/v1/chat/completions is the universal endpoint. If you want one code path that works across every provider in the catalog including Claude, this is it. You can use the OpenAI SDK to call Claude through SERV, no Anthropic SDK required.
/v1/responses is OpenAI-family only. Sending a non-OpenAI model returns: 400 The Responses API is not supported with model X. Use the Chat Completions API (POST /v1/chat/completions) instead.
/v1/messages is broad, not Anthropic-exclusive. It accepts OpenAI, Anthropic, xAI, Qwen, DeepSeek, and Gemma models. Useful if you have an existing Anthropic-SDK integration you don’t want to rewrite. Swap the model id to e.g. gpt-5.4-mini and keep using @anthropic-ai/sdk.
Two non-obvious gotchas worth knowing:
- Gemini on
/v1/messages returns 502. Use /v1/chat/completions for Gemini specifically. Gemma is fine on either.
- Google’s
google-genai SDK is not compatible with SERV. It speaks Gemini’s generateContent wire format, which SERV doesn’t expose. Use the OpenAI SDK against /v1/chat/completions for any Gemini or Gemma model.
Verifying your setup
The simplest smoke test is a single round-trip per SDK. If you hit a 400 on a fresh integration, re-read Gotcha #3. It’s almost always a missing system prompt.
The official SERV starter repo includes a test harness that exercises all three endpoints plus reasoning_effort scaling end-to-end. Mirror its patterns if you want a smoke test in your own CI.
See also
- SDK Migration Guide for users on Python, Vercel AI SDK, LangChain, Mastra, or raw
fetch.
- Models for the full pricing and context-window catalog.
- Playground to compare models and effort levels side by side.