Documentation Index
Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt
Use this file to discover all available pages before exploring further.
(If you are using the official OpenAI or Anthropic Node SDKs, please see the SDK Integration guide instead - it’s only a two-field change!)
SERV is wire-compatible with the OpenAI HTTP API and the Anthropic HTTP API. Anything that speaks one of those wire protocols can talk to SERV. That covers, among others:
- Python:
openai, anthropic
- Vercel AI SDK:
@ai-sdk/openai, @ai-sdk/anthropic
- LangChain:
langchain-openai, langchain-anthropic (Python or JS)
- LlamaIndex: OpenAI and Anthropic LLM classes
- Mastra, AutoGen, CrewAI, Instructor, LiteLLM, etc.
- Raw
fetch / curl / any HTTP client
What we’ve actually run end-to-end: We actively run integration tests against the official openai and anthropic Node/Python SDKs, as well as @ai-sdk/openai, @ai-sdk/anthropic, and LangChain. Other tools listed above should work seamlessly due to wire compatibility, but they are documented patterns rather than explicitly tested paths.
The migration is always the same shape:
- Change the base URL to SERV.
- Change the API key to your
SERV_API_KEY.
- Use any model ID from the SERV catalog. Frontier names you already know (e.g.
gpt-5.4-mini, claude-haiku-4.5) work as-is.
- Make sure you’re sending a system prompt (SERV requires one, see Gotcha #3 in the integration doc).
That’s it. No other code changes.
One SDK can’t reach every model. Google’s google-genai SDK does not work against SERV (it speaks Gemini’s native generateContent format, not the OpenAI or Anthropic wire formats). To use Gemini or Gemma models, switch to the OpenAI SDK and call /v1/chat/completions. See the compatibility matrix for the full picture.
The three universal rules
These apply to every SDK below. If you remember nothing else from this doc:
| OpenAI-shape SDK | Anthropic-shape SDK |
|---|
| Base URL | https://inference-api.openserv.ai/v1 (with /v1) | https://inference-api.openserv.ai (no /v1) |
| Auth | Authorization: Bearer <SERV_API_KEY> | Authorization: Bearer <SERV_API_KEY>. SDK constructor field is usually authToken or auth_token. |
| Model id | Any OpenAI / Google / xAI / Qwen / DeepSeek model from the catalog | Most of the catalog routes here too. See the compatibility matrix. |
For full parameter mapping (system prompts, reasoning effort, tool schema, response shape), see the Parameter map in the integration doc.
Python: openai
from openai import OpenAI
import os
client = OpenAI(
base_url="https://inference-api.openserv.ai/v1",
api_key=os.environ["SERV_API_KEY"],
)
resp = client.chat.completions.create(
model="gpt-5.4-mini",
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "What is a CPU register?"},
],
)
print(resp.choices[0].message.content)
Python: anthropic
from anthropic import Anthropic
import os
client = Anthropic(
base_url="https://inference-api.openserv.ai", # no /v1
auth_token=os.environ["SERV_API_KEY"], # not api_key
)
message = client.messages.create(
model="claude-haiku-4.5",
max_tokens=1024,
system="You answer in one sentence.",
messages=[{"role": "user", "content": "What is a CPU register?"}],
)
print(message.content[0].text)
Vercel AI SDK
The AI SDK exposes per-provider factories that accept custom base URLs.
@ai-sdk/openai
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const serv = createOpenAI({
baseURL: "https://inference-api.openserv.ai/v1",
apiKey: process.env.SERV_API_KEY!,
});
const { text } = await generateText({
model: serv("gpt-5.4-mini"),
system: "You are a concise assistant.",
prompt: "What is a CPU register?",
});
@ai-sdk/anthropic
import { createAnthropic } from "@ai-sdk/anthropic";
import { generateText } from "ai";
const serv = createAnthropic({
baseURL: "https://inference-api.openserv.ai/v1",
authToken: process.env.SERV_API_KEY!,
});
const { text } = await generateText({
model: serv("claude-haiku-4.5"),
system: "You answer in one sentence.",
prompt: "What is a CPU register?",
});
Vercel AI SDK note: Unlike the official Anthropic Node SDK, @ai-sdk/anthropic requires the /v1 suffix in the baseURL. The “no /v1” rule only applies to the official @anthropic-ai/sdk package.
LangChain (JS)
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
apiKey: process.env.SERV_API_KEY,
model: "gpt-5.4-mini",
configuration: { baseURL: "https://inference-api.openserv.ai/v1" },
});
const res = await llm.invoke([
{ role: "system", content: "You are a concise assistant." },
{ role: "user", content: "What is a CPU register?" },
]);
ChatAnthropic
import { ChatAnthropic } from "@langchain/anthropic";
const llm = new ChatAnthropic({
model: "claude-haiku-4.5",
clientOptions: {
baseURL: "https://inference-api.openserv.ai",
authToken: process.env.SERV_API_KEY,
},
});
LangChain (Python)
ChatOpenAI
from langchain_openai import ChatOpenAI
import os
llm = ChatOpenAI(
model="gpt-5.4-mini",
api_key=os.environ["SERV_API_KEY"],
base_url="https://inference-api.openserv.ai/v1",
)
ChatAnthropic
from langchain_anthropic import ChatAnthropic
import os
llm = ChatAnthropic(
model="claude-haiku-4.5",
client_options={
"base_url": "https://inference-api.openserv.ai",
"auth_token": os.environ["SERV_API_KEY"],
}
)
Raw fetch / curl
If you have no SDK at all, the wire format is plain JSON. Two examples that work today:
OpenAI shape
curl https://inference-api.openserv.ai/v1/chat/completions \
-H "Authorization: Bearer $SERV_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4-mini",
"messages": [
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "What is a CPU register?"}
]
}'
Anthropic shape
curl https://inference-api.openserv.ai/v1/messages \
-H "Authorization: Bearer $SERV_API_KEY" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-haiku-4.5",
"max_tokens": 1024,
"system": "You answer in one sentence.",
"messages": [{"role": "user", "content": "What is a CPU register?"}]
}'
Prompt Template
If you’re using Claude Code, Cursor, Copilot, or any other coding agent and want to migrate an existing integration to SERV, paste this into the chat:
Migrate this codebase from <CURRENT_PROVIDER> to SERV Reasoning. SERV is wire-compatible
with the OpenAI Chat Completions API and the Anthropic Messages API. The migration is
a four-field change per call site:
1. Base URL:
• If the file uses an OpenAI-shape client (openai / @ai-sdk/openai / ChatOpenAI / etc.)
-> set baseURL to "https://inference-api.openserv.ai/v1"
• If the file uses an Anthropic-shape client (anthropic / @anthropic-ai/sdk)
-> set baseURL to "https://inference-api.openserv.ai" (NO /v1 suffix, SDK adds it)
2. API key:
• Read from process.env.SERV_API_KEY (or os.environ["SERV_API_KEY"] in Python).
• For @anthropic-ai/sdk specifically, use the `authToken` constructor field, not `apiKey`.
3. Model id: keep existing frontier model IDs if they match the SERV catalog:
• OpenAI-shape: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano,
o3, o3-mini, o3-pro, o4-mini,
gemini-flash-latest, gemini-pro-latest,
gemma-4-26b-a4b-it, gemma-4-31b-it (Gemma needs the -it suffix)
grok-4.3, grok-4.20,
qwen3.6-flash, qwen3.6-max-preview,
deepseek-v4-pro, deepseek-v4-flash
• Anthropic-shape: claude-haiku-4.5, claude-sonnet-4.6, claude-opus-4.6
If the codebase uses an ID not in the catalog, flag it for me to pick a replacement.
4. SERV REQUIRES a system prompt. Audit every call site. If any request lacks a
system / instructions / developer message, add one ("You are a helpful assistant."
is fine as a default). Requests without one return 400.
If the codebase pins reasoning_effort to "minimal", change it to "low". SERV's allowed
values are "none" | "low" | "medium" | "high".
If the codebase uses the google-genai SDK for Gemini, replace it with the openai SDK
pointed at /v1/chat/completions. SERV doesn't support Gemini's native generateContent
wire format.
Leave all other parameters (messages array, tool definitions, response handling,
streaming, temperature, etc.) untouched. SERV honors them as-is.
Troubleshooting cheat sheet
| Symptom | Cause | Fix |
|---|
400 A system prompt is required | Missing system / instructions / developer message | Add one. SERV requires it on every request. |
400 Unsupported value: 'reasoning_effort' does not support 'minimal' | Pinned to OpenAI’s minimal | Change to 'low' or 'none' |
400 The Responses API is not supported with model X | /v1/responses is OpenAI-family only | Switch to /v1/chat/completions for Claude, Gemini, Gemma, Grok, Qwen, DeepSeek. See the compatibility matrix. |
404 on Anthropic Node SDK | Included /v1 in baseURL | Drop the /v1. The official Anthropic SDK adds it itself. |
404 on a raw fetch to /messages | Missing /v1 in path | Use /v1/messages |
404 The model 'gemma-...' does not exist | Gemma IDs need the -it suffix that the Playground display drops | Use gemma-4-31b-it / gemma-4-26b-a4b-it, not gemma-4-31b. |
401 on Anthropic SDK | Ambient ANTHROPIC_API_KEY env conflicts with SERV key | Pass authToken: SERV_API_KEY explicitly |
502 Upstream provider error on Gemini via /v1/messages | Gemini doesn’t currently route through the Anthropic-shape endpoint | Use /v1/chat/completions for Gemini. |
Empty content on a reasoning model | Reasoning budget exhausted the token cap | Raise max_completion_tokens / max_tokens |
Want to use Google’s google-genai SDK | SERV doesn’t expose Gemini’s generateContent wire format | Use the OpenAI SDK against /v1/chat/completions with the Gemini or Gemma model id. |
Model ID rejected with not_found_error | Unsupported ID | Pick one from the catalog. |
See also
- SDK Integration for OpenAI / Anthropic Node SDK users.
- Models for the full pricing and context-window catalog.
- Playground to compare models side by side before you migrate.