Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt

Use this file to discover all available pages before exploring further.

(If you are using the official OpenAI or Anthropic Node SDKs, please see the SDK Integration guide instead - it’s only a two-field change!) SERV is wire-compatible with the OpenAI HTTP API and the Anthropic HTTP API. Anything that speaks one of those wire protocols can talk to SERV. That covers, among others:
  • Python: openai, anthropic
  • Vercel AI SDK: @ai-sdk/openai, @ai-sdk/anthropic
  • LangChain: langchain-openai, langchain-anthropic (Python or JS)
  • LlamaIndex: OpenAI and Anthropic LLM classes
  • Mastra, AutoGen, CrewAI, Instructor, LiteLLM, etc.
  • Raw fetch / curl / any HTTP client
What we’ve actually run end-to-end: We actively run integration tests against the official openai and anthropic Node/Python SDKs, as well as @ai-sdk/openai, @ai-sdk/anthropic, and LangChain. Other tools listed above should work seamlessly due to wire compatibility, but they are documented patterns rather than explicitly tested paths.
The migration is always the same shape:
  1. Change the base URL to SERV.
  2. Change the API key to your SERV_API_KEY.
  3. Use any model ID from the SERV catalog. Frontier names you already know (e.g. gpt-5.4-mini, claude-haiku-4.5) work as-is.
  4. Make sure you’re sending a system prompt (SERV requires one, see Gotcha #3 in the integration doc).
That’s it. No other code changes.
One SDK can’t reach every model. Google’s google-genai SDK does not work against SERV (it speaks Gemini’s native generateContent format, not the OpenAI or Anthropic wire formats). To use Gemini or Gemma models, switch to the OpenAI SDK and call /v1/chat/completions. See the compatibility matrix for the full picture.

The three universal rules

These apply to every SDK below. If you remember nothing else from this doc:
OpenAI-shape SDKAnthropic-shape SDK
Base URLhttps://inference-api.openserv.ai/v1 (with /v1)https://inference-api.openserv.ai (no /v1)
AuthAuthorization: Bearer <SERV_API_KEY>Authorization: Bearer <SERV_API_KEY>. SDK constructor field is usually authToken or auth_token.
Model idAny OpenAI / Google / xAI / Qwen / DeepSeek model from the catalogMost of the catalog routes here too. See the compatibility matrix.
For full parameter mapping (system prompts, reasoning effort, tool schema, response shape), see the Parameter map in the integration doc.

Python: openai

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference-api.openserv.ai/v1",
    api_key=os.environ["SERV_API_KEY"],
)

resp = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "What is a CPU register?"},
    ],
)
print(resp.choices[0].message.content)

Python: anthropic

from anthropic import Anthropic
import os

client = Anthropic(
    base_url="https://inference-api.openserv.ai",   # no /v1
    auth_token=os.environ["SERV_API_KEY"],          # not api_key
)

message = client.messages.create(
    model="claude-haiku-4.5",
    max_tokens=1024,
    system="You answer in one sentence.",
    messages=[{"role": "user", "content": "What is a CPU register?"}],
)
print(message.content[0].text)

Vercel AI SDK

The AI SDK exposes per-provider factories that accept custom base URLs.

@ai-sdk/openai

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const serv = createOpenAI({
  baseURL: "https://inference-api.openserv.ai/v1",
  apiKey: process.env.SERV_API_KEY!,
});

const { text } = await generateText({
  model: serv("gpt-5.4-mini"),
  system: "You are a concise assistant.",
  prompt: "What is a CPU register?",
});

@ai-sdk/anthropic

import { createAnthropic } from "@ai-sdk/anthropic";
import { generateText } from "ai";

const serv = createAnthropic({
  baseURL: "https://inference-api.openserv.ai/v1",
  authToken: process.env.SERV_API_KEY!,
});

const { text } = await generateText({
  model: serv("claude-haiku-4.5"),
  system: "You answer in one sentence.",
  prompt: "What is a CPU register?",
});
Vercel AI SDK note: Unlike the official Anthropic Node SDK, @ai-sdk/anthropic requires the /v1 suffix in the baseURL. The “no /v1” rule only applies to the official @anthropic-ai/sdk package.

LangChain (JS)

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  apiKey: process.env.SERV_API_KEY,
  model: "gpt-5.4-mini",
  configuration: { baseURL: "https://inference-api.openserv.ai/v1" },
});

const res = await llm.invoke([
  { role: "system", content: "You are a concise assistant." },
  { role: "user", content: "What is a CPU register?" },
]);

ChatAnthropic

import { ChatAnthropic } from "@langchain/anthropic";

const llm = new ChatAnthropic({
  model: "claude-haiku-4.5",
  clientOptions: {
    baseURL: "https://inference-api.openserv.ai",
    authToken: process.env.SERV_API_KEY,
  },
});

LangChain (Python)

ChatOpenAI

from langchain_openai import ChatOpenAI
import os

llm = ChatOpenAI(
    model="gpt-5.4-mini",
    api_key=os.environ["SERV_API_KEY"],
    base_url="https://inference-api.openserv.ai/v1",
)

ChatAnthropic

from langchain_anthropic import ChatAnthropic
import os

llm = ChatAnthropic(
    model="claude-haiku-4.5",
    client_options={
        "base_url": "https://inference-api.openserv.ai",
        "auth_token": os.environ["SERV_API_KEY"],
    }
)

Raw fetch / curl

If you have no SDK at all, the wire format is plain JSON. Two examples that work today:

OpenAI shape

curl https://inference-api.openserv.ai/v1/chat/completions \
  -H "Authorization: Bearer $SERV_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4-mini",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user",   "content": "What is a CPU register?"}
    ]
  }'

Anthropic shape

curl https://inference-api.openserv.ai/v1/messages \
  -H "Authorization: Bearer $SERV_API_KEY" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-haiku-4.5",
    "max_tokens": 1024,
    "system": "You answer in one sentence.",
    "messages": [{"role": "user", "content": "What is a CPU register?"}]
  }'

Prompt Template

If you’re using Claude Code, Cursor, Copilot, or any other coding agent and want to migrate an existing integration to SERV, paste this into the chat:
Migrate this codebase from <CURRENT_PROVIDER> to SERV Reasoning. SERV is wire-compatible
with the OpenAI Chat Completions API and the Anthropic Messages API. The migration is
a four-field change per call site:

1. Base URL:
   • If the file uses an OpenAI-shape client (openai / @ai-sdk/openai / ChatOpenAI / etc.)
     -> set baseURL to "https://inference-api.openserv.ai/v1"
   • If the file uses an Anthropic-shape client (anthropic / @anthropic-ai/sdk)
     -> set baseURL to "https://inference-api.openserv.ai"  (NO /v1 suffix, SDK adds it)

2. API key:
   • Read from process.env.SERV_API_KEY (or os.environ["SERV_API_KEY"] in Python).
   • For @anthropic-ai/sdk specifically, use the `authToken` constructor field, not `apiKey`.

3. Model id: keep existing frontier model IDs if they match the SERV catalog:
   • OpenAI-shape:    gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano,
                      o3, o3-mini, o3-pro, o4-mini,
                      gemini-flash-latest, gemini-pro-latest,
                      gemma-4-26b-a4b-it, gemma-4-31b-it   (Gemma needs the -it suffix)
                      grok-4.3, grok-4.20,
                      qwen3.6-flash, qwen3.6-max-preview,
                      deepseek-v4-pro, deepseek-v4-flash
   • Anthropic-shape: claude-haiku-4.5, claude-sonnet-4.6, claude-opus-4.6

   If the codebase uses an ID not in the catalog, flag it for me to pick a replacement.

4. SERV REQUIRES a system prompt. Audit every call site. If any request lacks a
   system / instructions / developer message, add one ("You are a helpful assistant."
   is fine as a default). Requests without one return 400.

If the codebase pins reasoning_effort to "minimal", change it to "low". SERV's allowed
values are "none" | "low" | "medium" | "high".

If the codebase uses the google-genai SDK for Gemini, replace it with the openai SDK
pointed at /v1/chat/completions. SERV doesn't support Gemini's native generateContent
wire format.

Leave all other parameters (messages array, tool definitions, response handling,
streaming, temperature, etc.) untouched. SERV honors them as-is.

Troubleshooting cheat sheet

SymptomCauseFix
400 A system prompt is requiredMissing system / instructions / developer messageAdd one. SERV requires it on every request.
400 Unsupported value: 'reasoning_effort' does not support 'minimal'Pinned to OpenAI’s minimalChange to 'low' or 'none'
400 The Responses API is not supported with model X/v1/responses is OpenAI-family onlySwitch to /v1/chat/completions for Claude, Gemini, Gemma, Grok, Qwen, DeepSeek. See the compatibility matrix.
404 on Anthropic Node SDKIncluded /v1 in baseURLDrop the /v1. The official Anthropic SDK adds it itself.
404 on a raw fetch to /messagesMissing /v1 in pathUse /v1/messages
404 The model 'gemma-...' does not existGemma IDs need the -it suffix that the Playground display dropsUse gemma-4-31b-it / gemma-4-26b-a4b-it, not gemma-4-31b.
401 on Anthropic SDKAmbient ANTHROPIC_API_KEY env conflicts with SERV keyPass authToken: SERV_API_KEY explicitly
502 Upstream provider error on Gemini via /v1/messagesGemini doesn’t currently route through the Anthropic-shape endpointUse /v1/chat/completions for Gemini.
Empty content on a reasoning modelReasoning budget exhausted the token capRaise max_completion_tokens / max_tokens
Want to use Google’s google-genai SDKSERV doesn’t expose Gemini’s generateContent wire formatUse the OpenAI SDK against /v1/chat/completions with the Gemini or Gemma model id.
Model ID rejected with not_found_errorUnsupported IDPick one from the catalog.

See also

  • SDK Integration for OpenAI / Anthropic Node SDK users.
  • Models for the full pricing and context-window catalog.
  • Playground to compare models side by side before you migrate.