Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt

Use this file to discover all available pages before exploring further.

Integrate your existing OpenAI or Anthropic SDKs with SERV in minutes. You only need to update two fields: the Base URL and the API Key. The rest of your application (including prompts, tool definitions, and business logic) remains completely untouched. SERV exposes three wire-compatible HTTP endpoints under a single base URL:
EndpointShapeUse it for
POST /v1/chat/completionsOpenAIUniversal. Works with every model in the catalog.
POST /v1/responsesOpenAIOpenAI-family models when you want streamed reasoning summaries
POST /v1/messagesAnthropicClaude family and several others. See the compatibility matrix below.
Base URL: https://inference-api.openserv.ai

Quickstart

OpenAI SDK → SERV (/v1/chat/completions)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://inference-api.openserv.ai/v1",  // /v1 IS required here
  apiKey: process.env.SERV_API_KEY,
});

const completion = await client.chat.completions.create({
  model: "gpt-5.4-mini",
  messages: [
    // System prompt is required, see Gotcha #3 below
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "What is a CPU register?" },
  ],
});

console.log(completion.choices[0].message.content);

OpenAI SDK → SERV (/v1/responses)

Use the Responses API when you want the reasoning trace alongside the answer. OpenAI-family models only.
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://inference-api.openserv.ai/v1",
  apiKey: process.env.SERV_API_KEY,
});

const response = await client.responses.create({
  model: "gpt-5.4",
  instructions: "You are a careful reasoner.",   // top-level, counts as the system prompt
  input: "What is the integral of x^2 from 0 to 3?",
});

console.log(response.output_text);

Anthropic SDK → SERV (/v1/messages)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://inference-api.openserv.ai",  // NO /v1, the SDK adds it
  authToken: process.env.SERV_API_KEY,           // authToken, not apiKey
});

const message = await client.messages.create({
  model: "claude-haiku-4.5",
  max_tokens: 1024,                              // required by the Anthropic API
  system: "You answer in one sentence.",         // top-level, required by SERV
  messages: [{ role: "user", content: "What is a CPU register?" }],
});

console.log(message.content.find(b => b.type === "text").text);

Gotchas

These four behaviors are what trip up most integrations.

Gotcha #1: baseURL suffix differs by SDK

SDKbaseURL valueWhy
OpenAI SDKhttps://inference-api.openserv.ai/v1OpenAI SDK expects the /v1 to live in the base URL
Anthropic SDKhttps://inference-api.openserv.aiAnthropic SDK’s resource methods (client.messages.create) internally hit /v1/messages. Including /v1 yourself would call /v1/v1/messages and 404

Gotcha #2: auth field name differs

SDKFieldNotes
OpenAI SDKapiKeyStandard.
Anthropic SDKauthTokenapiKey also works, but authToken is preferred for third-party gateways. It keeps ANTHROPIC_API_KEY free if you ever want to fall back to direct Anthropic.

Gotcha #3: SERV requires a system prompt

Every request needs a system, developer, or instructions message. Sending without one returns:
400 A system prompt is required. Please include a system or developer message in your request.
This is a SERV-specific constraint, not part of either upstream API. Where the system prompt goes by endpoint:
EndpointWhere the system prompt goes
/v1/chat/completionsmessages: [{ role: "system", content: "..." }, ...]
/v1/responsestop-level instructions: "..."
/v1/messagestop-level system: "..."

Gotcha #4: reasoning_effort allowed values are SERV-specific

The Playground’s Low / Medium / High selector maps to the reasoning_effort request parameter.
SideAllowed values
Upstream OpenAIminimal | low | medium | high
SERVnone | low | medium | high
Passing 'minimal' to SERV returns 400. If you’re porting an OpenAI integration that hard-codes 'minimal', change it to 'none' or 'low'. For Claude models, the equivalent control is the Anthropic thinking parameter: { type: "enabled", budget_tokens: <int ≥ 1024> }.

Parameter map: OpenAI ↔ Anthropic ↔ SERV

If you’re moving an integration across SDKs, this is what you need.
ConceptOpenAI ChatOpenAI ResponsesAnthropic Messages
HTTP path/v1/chat/completions/v1/responses/v1/messages
Auth field (SDK constructor)apiKeyapiKeyauthToken
baseURL suffix to use with SERV/v1/v1(none)
Token cap fieldmax_completion_tokensmax_output_tokensmax_tokens (required)
System promptmessage with role:"system"top-level instructionstop-level system
User message shape{role, content} in messages[]top-level input (string or array){role, content} in messages[]
Reasoning-effort controlreasoning_effort: low | medium | highreasoning: { effort, summary }thinking: { type:"enabled", budget_tokens }
Streamingstream: truestream: truestream: true
Stop sequencesstopn/astop_sequences
Tool schematools: [{type:"function", function:{name, parameters}}]tools: [{type:"function", ...}]tools: [{name, input_schema}] (no nested function:)
Tool-choicetool_choice: "auto" | {type:"function", function:{name}}tool_choice: ...tool_choice: "auto" | "any" | {type:"tool", name}
Response textchoices[0].message.contentoutput_text (convenience) or output[] blockscontent[] array, find block with type === "text"
Token usage fieldsusage.prompt_tokens / completion_tokens / total_tokensusage.input_tokens / output_tokens / total_tokensusage.input_tokens / output_tokens
Reasoning tokensusage.completion_tokens_details.reasoning_tokensreasoning items inside output[]thinking blocks inside content[] (when thinking enabled)
Cache metricsusage.prompt_tokens_details.cached_tokensusage.input_tokens_details.cached_tokensusage.cache_read_input_tokens, usage.cache_creation_input_tokens

Available models

The live catalog mirrors the Playground picker.
ProviderModel IDs
OpenAIgpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, o3, o3-mini, o3-pro, o4-mini
Anthropicclaude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5
Googlegemini-flash-latest, gemini-pro-latest, gemma-4-26b-a4b-it, gemma-4-31b-it
xAIgrok-4.3, grok-4.20
Qwenqwen3.6-flash, qwen3.6-max-preview
DeepSeekdeepseek-v4-pro, deepseek-v4-flash
Gemma: the Playground displays these as “Gemma 4 31B” and “Gemma 4 26B A4B”, but the API requires the -it (instruction-tuned) suffix. Use gemma-4-31b-it and gemma-4-26b-a4b-it. Without the suffix you get 404 The model 'gemma-...' does not exist.
For pricing and context windows, see Models.

Model × endpoint compatibility matrix

Not every model works on every endpoint. The rule is not “Claude → /messages, everyone else → /chat/completions.” The real behavior:
Provider/v1/chat/completions/v1/responses/v1/messages
OpenAI
Anthropic
Google (Gemini)⚠️ (502 upstream)
Google (Gemma)
xAI
Qwen
DeepSeek
Three rules of thumb fall out of this:
  1. /v1/chat/completions is the universal endpoint. If you want one code path that works across every provider in the catalog including Claude, this is it. You can use the OpenAI SDK to call Claude through SERV, no Anthropic SDK required.
  2. /v1/responses is OpenAI-family only. Sending a non-OpenAI model returns: 400 The Responses API is not supported with model X. Use the Chat Completions API (POST /v1/chat/completions) instead.
  3. /v1/messages is broad, not Anthropic-exclusive. It accepts OpenAI, Anthropic, xAI, Qwen, DeepSeek, and Gemma models. Useful if you have an existing Anthropic-SDK integration you don’t want to rewrite. Swap the model id to e.g. gpt-5.4-mini and keep using @anthropic-ai/sdk.
Two non-obvious gotchas worth knowing:
  • Gemini on /v1/messages returns 502. Use /v1/chat/completions for Gemini specifically. Gemma is fine on either.
  • Google’s google-genai SDK is not compatible with SERV. It speaks Gemini’s generateContent wire format, which SERV doesn’t expose. Use the OpenAI SDK against /v1/chat/completions for any Gemini or Gemma model.

Verifying your setup

The simplest smoke test is a single round-trip per SDK. If you hit a 400 on a fresh integration, re-read Gotcha #3. It’s almost always a missing system prompt. The official SERV starter repo includes a test harness that exercises all three endpoints plus reasoning_effort scaling end-to-end. Mirror its patterns if you want a smoke test in your own CI.

See also

  • SDK Migration Guide for users on Python, Vercel AI SDK, LangChain, Mastra, or raw fetch.
  • Models for the full pricing and context-window catalog.
  • Playground to compare models and effort levels side by side.