DeepSeek API Docs: Complete Setup, Pricing, Models & Code Examples

Last updated: May 2026
Verified against official DeepSeek API Docs on May 3, 2026

The DeepSeek API lets developers access DeepSeek’s language models through OpenAI-compatible and Anthropic-compatible API formats. That means you can use familiar SDKs, change the base URL, choose a DeepSeek model, and start sending chat-completion requests without rebuilding your entire integration stack. The current official V4 API model names are deepseek-v4-flash and deepseek-v4-pro; the older deepseek-chat and deepseek-reasoner names are scheduled for retirement on July 24, 2026.

Quick Answer: What Is the DeepSeek API?

The DeepSeek API is a developer API for sending prompts, chat messages, tool definitions, and structured-output requests to DeepSeek models. It is designed to work with OpenAI-style Chat Completions and also supports an Anthropic-compatible endpoint, which makes migration easier for teams that already use common LLM SDKs.

In practical terms, most developers start by creating an API key, setting https://api.deepseek.com as the OpenAI-compatible base URL, and calling /chat/completions with either deepseek-v4-flash or deepseek-v4-pro.

Item	Current Details
API type	Chat Completions API
OpenAI base URL	`https://api.deepseek.com`
Anthropic base URL	`https://api.deepseek.com/anthropic`
Authentication	Bearer token API key
Current models	`deepseek-v4-flash`, `deepseek-v4-pro`
OpenAI SDK compatibility	Supported by changing `base_url` / `baseURL`
Anthropic compatibility	Supported through the Anthropic endpoint
Pricing unit	Per 1M tokens
Context length	1M tokens
Best use cases	Chatbots, coding tools, agents, RAG, long-context analysis, structured outputs

DeepSeek API flow from user request to AI response using V4 models

DeepSeek API Quickstart

Follow this basic setup flow:

Create a DeepSeek Platform account.
Generate an API key.
Store the key in an environment variable.
Install the OpenAI SDK.
Send your first /chat/completions request.

The official docs show the API key should be used with the Authorization: Bearer header and the OpenAI-compatible base URL.

cURL Example

export DEEPSEEK_API_KEY="your_api_key_here"

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a helpful technical assistant."},
      {"role": "user", "content": "Explain the DeepSeek API in one paragraph."}
    ],
    "stream": false
  }'

Python Example

import os
from openai import OpenAI

api_key = os.environ.get("DEEPSEEK_API_KEY")

if not api_key:
    raise RuntimeError("Missing DEEPSEEK_API_KEY environment variable")

client = OpenAI(
    api_key=api_key,
    base_url="https://api.deepseek.com",
)

try:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "system", "content": "You are a concise API assistant."},
            {"role": "user", "content": "Give me a quick DeepSeek API setup checklist."},
        ],
        stream=False,
    )

    print(response.choices[0].message.content)

except Exception as exc:
    print(f"DeepSeek API request failed: {exc}")

Node.js Example

import OpenAI from "openai";

const apiKey = process.env.DEEPSEEK_API_KEY;

if (!apiKey) {
  throw new Error("Missing DEEPSEEK_API_KEY environment variable");
}

const client = new OpenAI({
  apiKey,
  baseURL: "https://api.deepseek.com",
});

async function main() {
  try {
    const completion = await client.chat.completions.create({
      model: "deepseek-v4-flash",
      messages: [
        { role: "system", content: "You are a practical developer assistant." },
        { role: "user", content: "Show me how to start with the DeepSeek API." }
      ],
      stream: false
    });

    console.log(completion.choices[0].message.content);
  } catch (error) {
    console.error("DeepSeek API request failed:", error);
  }
}

main();

DeepSeek API Base URL and Authentication

For OpenAI-compatible requests, use:

https://api.deepseek.com

For Anthropic-compatible requests, use:

https://api.deepseek.com/anthropic

The official first-call documentation lists both base URLs and confirms that DeepSeek supports OpenAI and Anthropic API formats.

Authentication uses an API key. In OpenAI-compatible requests, pass it as:

Authorization: Bearer ${DEEPSEEK_API_KEY}

Common authentication mistakes include using the wrong base URL, passing the key without the Bearer prefix, hardcoding the key in source code, or exposing the key in frontend JavaScript. Always call the DeepSeek API from your server, serverless function, or backend proxy—not directly from the browser.

DeepSeek Models Explained

DeepSeek currently lists two V4 model IDs in the official model endpoint example: deepseek-v4-flash and deepseek-v4-pro.

Model ID	Best For	Thinking Support	Cost Profile	Context Length	Recommended Use
`deepseek-v4-flash`	Fast, economical apps	Thinking and non-thinking	Lower cost	1M tokens	Chatbots, support agents, summarization, high-volume workloads
`deepseek-v4-pro`	Harder reasoning and agentic work	Thinking and non-thinking	Higher cost	1M tokens	Complex coding, agent workflows, advanced reasoning, long-document analysis

Both V4 models support JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode, according to the Models & Pricing page.

Use deepseek-v4-flash when you need lower latency, lower cost, and broad production coverage. Use deepseek-v4-pro when task quality matters more than cost, especially for complex reasoning, coding, multi-step workflows, or high-value agent tasks.

Legacy Model Names and Migration

The legacy names deepseek-chat and deepseek-reasoner still appear for compatibility, but they are scheduled for retirement. DeepSeek says deepseek-chat maps to the non-thinking mode of deepseek-v4-flash, while deepseek-reasoner maps to the thinking mode of deepseek-v4-flash; the retirement date is July 24, 2026.

Update existing code like this:

- model="deepseek-chat"
+ model="deepseek-v4-flash"

- model="deepseek-reasoner"
+ model="deepseek-v4-flash"
+ extra_body={"thinking": {"type": "enabled"}}

For new projects, do not use the legacy names.

DeepSeek API Pricing

DeepSeek pricing is listed per 1M tokens. Prices can change, and DeepSeek explicitly recommends checking the pricing page regularly before topping up or planning production usage.

Model	Cache-Hit Input / 1M	Cache-Miss Input / 1M	Output / 1M	Notes
`deepseek-v4-flash`	$0.0028	$0.14	$0.28	Economical V4 model
`deepseek-v4-pro`	$0.003625	$0.435	$0.87	75% discount listed until May 31, 2026, 15:59 UTC

The listed deepseek-v4-pro prices are discounted from the crossed-out launch prices, and the input cache-hit price reduction took effect on April 26, 2026, 12:15 UTC.

Cost formula:

cost =
  (cache_hit_input_tokens / 1,000,000 × cache_hit_price)
+ (cache_miss_input_tokens / 1,000,000 × cache_miss_price)
+ (output_tokens / 1,000,000 × output_price)

Example using deepseek-v4-flash:

Cache-hit input: 200,000 tokens × $0.0028 / 1M = $0.00056
Cache-miss input: 100,000 tokens × $0.14 / 1M = $0.01400
Output: 50,000 tokens × $0.28 / 1M = $0.01400

Estimated total = $0.02856

Context Caching

Context caching is enabled by default for DeepSeek API users. It stores reusable input prefixes on disk so later requests with matching prefixes can receive cache-hit pricing instead of cache-miss pricing.

This matters for repeated prompts, RAG systems, long documents, multi-turn conversations, and agent workflows. For example, if your app repeatedly sends the same system prompt, product documentation, or retrieved context, DeepSeek may count part of the input as a cache hit.

DeepSeek exposes cache usage through:

prompt_cache_hit_tokens
prompt_cache_miss_tokens

The prompt_tokens value equals cache-hit tokens plus cache-miss tokens.

Best practices:

Put stable instructions and repeated documents early in the prompt.
Keep shared RAG context consistent across requests.
Reuse conversation prefixes when appropriate.
Monitor hit and miss tokens in logs.
Do not assume a 100% hit rate; DeepSeek describes caching as best effort.

Thinking Mode

DeepSeek V4 supports thinking mode, where the model can produce reasoning content before the final answer. The toggle defaults to enabled, and the OpenAI-compatible format uses:

{
  "thinking": {
    "type": "enabled"
  }
}

DeepSeek supports reasoning_effort values of high and max; compatibility mappings convert low and medium to high, and xhigh to max.

Important: in thinking mode, temperature, top_p, presence_penalty, and frequency_penalty do not take effect.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Design a scalable RAG architecture for a legal search app."}
    ],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

print(response.choices[0].message.content)

Use higher reasoning effort for complex coding, math, planning, and agentic decisions. Avoid it for simple classification, short copy, or high-volume requests where latency and cost matter more.

Streaming Responses

Streaming improves perceived speed by returning partial message deltas as they are generated. DeepSeek’s Chat Completion reference says streamed tokens are sent as server-sent events and the stream ends with data: [DONE].

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Write a short onboarding checklist for API developers."}
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

For chat UIs, streaming is usually better than waiting for the full response. For batch jobs, non-streaming is simpler.

JSON Output

Use JSON output when your application needs structured, machine-readable responses. DeepSeek supports response_format: {"type": "json_object"} and says the prompt must also include the word “json” plus an example of the desired format.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {
            "role": "system",
            "content": 'Return valid json only, for example: {"summary": "...", "tags": ["..."]}'
        },
        {
            "role": "user",
            "content": "Summarize this: DeepSeek API supports V4 models, caching, JSON output, and tool calls."
        }
    ],
    response_format={"type": "json_object"},
    max_tokens=500,
)

data = json.loads(response.choices[0].message.content)
print(data)

Common mistakes include forgetting to ask for JSON in the prompt, setting max_tokens too low, or assuming JSON output means the schema is semantically correct. Always parse and validate the result.

Tool Calls / Function Calling

Tool calls let the model request an external function, such as checking weather, querying a database, searching a knowledge base, or calling an internal service. DeepSeek’s Chat Completion reference says tools currently support functions, with up to 128 functions.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_order_status",
            "description": "Get the current status of a customer order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The order ID."
                    }
                },
                "required": ["order_id"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Check order A12345."}
    ],
    tools=tools,
    tool_choice="auto",
)

message = response.choices[0].message

if message.tool_calls:
    for call in message.tool_calls:
        args = json.loads(call.function.arguments)
        # Validate args before executing real business logic.
        print(call.function.name, args)
else:
    print(message.content)

DeepSeek warns that generated function arguments may not always be valid JSON and may include hallucinated parameters, so validate arguments before executing functions.

Rate Limits, Errors, and Troubleshooting

DeepSeek dynamically limits user concurrency based on server load. When the concurrency limit is reached, the API returns HTTP 429.

Code	Meaning	Common Cause	Fix
400	Invalid Format	Bad request body	Follow the API format and error hints
401	Authentication Fails	Wrong API key	Check or recreate the API key
402	Insufficient Balance	No balance	Top up the account
422	Invalid Parameters	Unsupported or malformed parameters	Fix request parameters
429	Rate Limit Reached	Too many requests / concurrency pressure	Pace requests and retry later
500	Server Error	DeepSeek server issue	Retry after a brief wait
503	Server Overloaded	High traffic	Retry after a brief wait

These error definitions come from the official Error Codes page.

Production tips:

Add exponential backoff for 429, 500, and 503.
Set request timeouts.
Monitor usage fields.
Log model IDs and finish reasons.
Check DeepSeek Service Status during incidents; the status page shows API and web chat availability.

DeepSeek API vs OpenAI API Compatibility

“OpenAI-compatible” means you can often keep your existing OpenAI SDK and change three things:

Base URL
API key
Model name

For example:

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

Then use:

model="deepseek-v4-flash"

or:

model="deepseek-v4-pro"

However, compatibility does not mean identical behavior. Model outputs, latency, supported parameters, tool-call behavior, JSON reliability, context handling, and cost structure should all be tested before a full migration. Also note that in thinking mode some sampling parameters do not take effect.

Best Practices for Production Apps

Use this checklist before shipping:

Never expose DeepSeek API keys in frontend code.
Send API requests from a backend or serverless function.
Use environment variables for secrets.
Prefer deepseek-v4-flash for high-volume tasks.
Escalate to deepseek-v4-pro for complex reasoning.
Cap max_tokens.
Track input, output, cache-hit, and cache-miss tokens.
Use context caching intentionally.
Validate JSON output.
Validate tool-call arguments before execution.
Add retries for 429, 500, and 503.
Monitor DeepSeek pricing, model availability, and status updates.

FAQ

What is the DeepSeek API?

The DeepSeek API is a developer interface for accessing DeepSeek language models through OpenAI-compatible and Anthropic-compatible API formats. It supports chat completions, streaming, JSON output, tool calls, thinking mode, and context caching.

Where are the official DeepSeek API Docs?

The official DeepSeek API Docs are hosted by DeepSeek and include quickstart guides, model pricing, API reference pages, guide pages, news, FAQs, and a change log.

How do I get a DeepSeek API key?

Create a DeepSeek Platform account, then generate an API key from the platform. The official quickstart links API key creation from the first-call setup page.

Is DeepSeek API compatible with the OpenAI SDK?

Yes. DeepSeek says its API uses a format compatible with OpenAI and can be accessed with the OpenAI SDK by changing the base URL and model.

What is the DeepSeek API base URL?

The OpenAI-compatible base URL is https://api.deepseek.com. The Anthropic-compatible base URL is https://api.deepseek.com/anthropic.

Which DeepSeek model should I use?

Use deepseek-v4-flash for cost-efficient, high-volume work. Use deepseek-v4-pro for complex reasoning, coding, and agentic workflows where output quality is more important than cost.

How much does DeepSeek API cost?

As of May 3, 2026, deepseek-v4-flash is listed at $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens. deepseek-v4-pro is listed at discounted rates of $0.003625, $0.435, and $0.87 respectively. Always verify current pricing before production use.

What is the difference between `deepseek-v4-flash` and `deepseek-v4-pro`?

deepseek-v4-flash is the lower-cost, faster model for general and high-volume workloads. deepseek-v4-pro is the more capable model for difficult reasoning, coding, and agent workflows.

Are `deepseek-chat` and `deepseek-reasoner` still supported?

They are legacy names and are scheduled to be retired on July 24, 2026. For compatibility, they currently map to deepseek-v4-flash non-thinking and thinking modes respectively.

Can I use DeepSeek API with Node.js?

Yes. Install the OpenAI Node.js SDK, set baseURL to https://api.deepseek.com, and pass your DeepSeek API key.

Can I use DeepSeek API with Python?

Yes. Install the OpenAI Python SDK, initialize OpenAI(base_url="https://api.deepseek.com"), and call client.chat.completions.create() with a DeepSeek model.

How do I fix a DeepSeek API 429 error?

A 429 means rate limit reached. Pace requests, add retries with backoff, reduce concurrency, and consider fallback providers for critical workloads.

Does DeepSeek API support JSON output?

Yes. Set response_format to {"type": "json_object"} and explicitly ask for JSON in the prompt.

Does DeepSeek API support tool calling?

Yes. DeepSeek supports function tools in chat completions. Validate all generated function arguments before running external actions.

Is DeepSeek API safe for production?

It can be used in production if you secure API keys, handle errors, monitor token usage, validate outputs, use retries, and keep pricing/model information updated.

Conclusion

The fastest path to using the DeepSeek API is simple: create an API key, use the OpenAI SDK, set the DeepSeek base URL, choose either deepseek-v4-flash or deepseek-v4-pro, and monitor token usage carefully. For production applications, pay close attention to context caching, output validation, tool-call safety, and the upcoming retirement of legacy model names.

Table of Contents