Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.openserv.ai/llms.txt

Use this file to discover all available pages before exploring further.

SERV Reasoning is a reasoning enhancement layer over the frontier models you already use. Same model name, same SDK, same prompts. You change the endpoint, you get cheaper inference and more consistent agent behavior.
Private Beta Access is currently limited to selected partners and teams. Join the waitlist →

Try it first in the Playground

The fastest way to feel the difference: open the Playground, pick any model, and compare Raw mode against SERV Reasoning side by side on the same prompt. console.openserv.ai/playground Two model pickers, same input, instant comparison. Use the Low / Medium / High selector on each side to tune reasoning depth. When you’re ready to integrate, come back here.

Integrate in two steps

  1. Generate an API key at console.openserv.ai.
  2. Point your existing OpenAI or Anthropic SDK at the SERV endpoint and use the same model name you would normally pass.

OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://inference-api.openserv.ai/v1",
  apiKey: process.env.SERV_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-5.4-mini",
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Hello!" },
  ],
});

console.log(response.choices[0].message.content);

Anthropic SDK

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "https://inference-api.openserv.ai",
  authToken: process.env.SERV_API_KEY,
});

const message = await client.messages.create({
  model: "claude-haiku-4.5",
  max_tokens: 1024,
  system: "You answer in one sentence.",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(message.content[0].text);
That’s the entire migration. Your prompts, tool definitions, and application logic stay identical. For the full list of supported model IDs and pricing, see Models.

Before you build: four things to know

These four behaviors trip up most integrations. Details and workarounds live in SDK Integration.
  1. baseURL suffix differs by SDK. OpenAI SDK takes /v1 in the base URL. Anthropic SDK does not (it appends /v1/messages internally).
  2. Auth field name differs by SDK. OpenAI uses apiKey. Anthropic prefers authToken for third-party gateways.
  3. A system prompt is required. Every request needs a system message, instructions field, or developer message. Requests without one return 400.
  4. reasoning_effort allowed values are none | low | medium | high. OpenAI’s minimal is not accepted. If your code hard-codes it, change to low or none.

Migrating an existing project? Hand this to your AI

Paste this into Claude Code, Cursor, or any coding assistant. It accounts for all four gotchas above.
I'm migrating my project from [OpenAI / Anthropic / other provider] to SERV Reasoning.
SERV is wire-compatible with the OpenAI Chat Completions API and the Anthropic
Messages API. Please refactor as follows:

1. Set the client baseURL:
   • OpenAI-shape clients   -> "https://inference-api.openserv.ai/v1"  (WITH /v1)
   • Anthropic-shape client -> "https://inference-api.openserv.ai"    (WITHOUT /v1)

2. Read the API key from SERV_API_KEY.
   • OpenAI SDK: use the `apiKey` constructor field.
   • Anthropic SDK: use the `authToken` constructor field, not `apiKey`.

3. Leave model IDs as-is if they are already SERV-supported. The current public
   catalog is:
     OpenAI-shape: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano,
                   o3, o3-mini, o3-pro, o4-mini,
                   gemini-flash-latest, gemini-pro-latest,
                   gemma-4-26b-a4b, gemma-4-31b,
                   grok-4.3, grok-4.20,
                   qwen3.6-flash, qwen3.6-max-preview,
                   deepseek-v4-pro, deepseek-v4-flash
     Anthropic-shape: claude-opus-4.6, claude-sonnet-4.6, claude-haiku-4.5
   If you encounter a non-supported ID, flag it for me to pick a replacement.

4. SERV requires a system prompt on every request. Audit every call site. If any
   request lacks a system message / instructions / developer message, add a
   default ("You are a helpful assistant." is fine) and flag it for me to refine.

5. If any code passes `reasoning_effort: "minimal"`, change to `"low"`. SERV's
   allowed values are: none | low | medium | high.

6. Leave all prompts, tool/function definitions, message structures, streaming
   logic, and business logic UNCHANGED.

Produce a diff for every call site, and flag any provider-specific features
(prompt caching, assistants API, etc.) for manual review.

SERV Reasoning shines on bounded, repetitive work. That’s exactly the kind of workload that drives most agent costs in production today.
  • Agent loops with tool calls. Trading agents, research agents, ops agents, anything that operates inside a known decision space. The reasoning structure is built once and applied across thousands of iterations.
  • Classification, extraction, routing. High-volume, narrow-output tasks where consistency matters more than creative variance.
  • Repeated workflows. Invoice processing, support triage, content moderation, intent detection, summarization pipelines.
  • Structured generation. JSON outputs, schema-conforming responses, function calling.
  • Plan-then-execute pipelines. One creative step at the top, many mechanical steps below. Use a stronger model at higher effort for planning, and a cheaper model at lower effort for execution.
The unlock across all of these is the same: a cheaper model paired with a system prompt tailored to your specific task (your domain, your data shape, your output format) plus SERV Reasoning will routinely match a frontier model’s output at a fraction of the cost. The narrower and more repetitive the work, the wider the gap closes. For enterprise workloads that run the same shape of request thousands of times a day, this is where most of the savings live.
Need help picking a model or migrating a non-trivial codebase? Reach out via the waitlist form and the team will get you set up.