Last updated: May 2026
Verified against official DeepSeek API Docs on May 3, 2026
The DeepSeek API lets developers access DeepSeek’s language models through OpenAI-compatible and Anthropic-compatible API formats. That means you can use familiar SDKs, change the base URL, choose a DeepSeek model, and start sending chat-completion requests without rebuilding your entire integration stack. The current official V4 API model names are deepseek-v4-flash and deepseek-v4-pro; the older deepseek-chat and deepseek-reasoner names are scheduled for retirement on July 24, 2026.
Table of Contents
Quick Answer: What Is the DeepSeek API?
The DeepSeek API is a developer API for sending prompts, chat messages, tool definitions, and structured-output requests to DeepSeek models. It is designed to work with OpenAI-style Chat Completions and also supports an Anthropic-compatible endpoint, which makes migration easier for teams that already use common LLM SDKs.
In practical terms, most developers start by creating an API key, setting https://api.deepseek.com as the OpenAI-compatible base URL, and calling /chat/completions with either deepseek-v4-flash or deepseek-v4-pro.
| Item | Current Details |
|---|---|
| API type | Chat Completions API |
| OpenAI base URL | https://api.deepseek.com |
| Anthropic base URL | https://api.deepseek.com/anthropic |
| Authentication | Bearer token API key |
| Current models | deepseek-v4-flash, deepseek-v4-pro |
| OpenAI SDK compatibility | Supported by changing base_url / baseURL |
| Anthropic compatibility | Supported through the Anthropic endpoint |
| Pricing unit | Per 1M tokens |
| Context length | 1M tokens |
| Best use cases | Chatbots, coding tools, agents, RAG, long-context analysis, structured outputs |

DeepSeek API Quickstart
Follow this basic setup flow:
- Create a DeepSeek Platform account.
- Generate an API key.
- Store the key in an environment variable.
- Install the OpenAI SDK.
- Send your first
/chat/completionsrequest.
The official docs show the API key should be used with the Authorization: Bearer header and the OpenAI-compatible base URL.
cURL Example
export DEEPSEEK_API_KEY="your_api_key_here"
curl https://api.deepseek.com/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a helpful technical assistant."},
{"role": "user", "content": "Explain the DeepSeek API in one paragraph."}
],
"stream": false
}'
Python Example
import os
from openai import OpenAI
api_key = os.environ.get("DEEPSEEK_API_KEY")
if not api_key:
raise RuntimeError("Missing DEEPSEEK_API_KEY environment variable")
client = OpenAI(
api_key=api_key,
base_url="https://api.deepseek.com",
)
try:
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise API assistant."},
{"role": "user", "content": "Give me a quick DeepSeek API setup checklist."},
],
stream=False,
)
print(response.choices[0].message.content)
except Exception as exc:
print(f"DeepSeek API request failed: {exc}")
Node.js Example
import OpenAI from "openai";
const apiKey = process.env.DEEPSEEK_API_KEY;
if (!apiKey) {
throw new Error("Missing DEEPSEEK_API_KEY environment variable");
}
const client = new OpenAI({
apiKey,
baseURL: "https://api.deepseek.com",
});
async function main() {
try {
const completion = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [
{ role: "system", content: "You are a practical developer assistant." },
{ role: "user", content: "Show me how to start with the DeepSeek API." }
],
stream: false
});
console.log(completion.choices[0].message.content);
} catch (error) {
console.error("DeepSeek API request failed:", error);
}
}
main();
DeepSeek API Base URL and Authentication
For OpenAI-compatible requests, use:
https://api.deepseek.com
For Anthropic-compatible requests, use:
https://api.deepseek.com/anthropic
The official first-call documentation lists both base URLs and confirms that DeepSeek supports OpenAI and Anthropic API formats.
Authentication uses an API key. In OpenAI-compatible requests, pass it as:
Authorization: Bearer ${DEEPSEEK_API_KEY}
Common authentication mistakes include using the wrong base URL, passing the key without the Bearer prefix, hardcoding the key in source code, or exposing the key in frontend JavaScript. Always call the DeepSeek API from your server, serverless function, or backend proxy—not directly from the browser.
DeepSeek Models Explained
DeepSeek currently lists two V4 model IDs in the official model endpoint example: deepseek-v4-flash and deepseek-v4-pro.
| Model ID | Best For | Thinking Support | Cost Profile | Context Length | Recommended Use |
|---|---|---|---|---|---|
deepseek-v4-flash | Fast, economical apps | Thinking and non-thinking | Lower cost | 1M tokens | Chatbots, support agents, summarization, high-volume workloads |
deepseek-v4-pro | Harder reasoning and agentic work | Thinking and non-thinking | Higher cost | 1M tokens | Complex coding, agent workflows, advanced reasoning, long-document analysis |
Both V4 models support JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode, according to the Models & Pricing page.
Use deepseek-v4-flash when you need lower latency, lower cost, and broad production coverage. Use deepseek-v4-pro when task quality matters more than cost, especially for complex reasoning, coding, multi-step workflows, or high-value agent tasks.
Legacy Model Names and Migration
The legacy names deepseek-chat and deepseek-reasoner still appear for compatibility, but they are scheduled for retirement. DeepSeek says deepseek-chat maps to the non-thinking mode of deepseek-v4-flash, while deepseek-reasoner maps to the thinking mode of deepseek-v4-flash; the retirement date is July 24, 2026.
Update existing code like this:
- model="deepseek-chat"
+ model="deepseek-v4-flash"
- model="deepseek-reasoner"
+ model="deepseek-v4-flash"
+ extra_body={"thinking": {"type": "enabled"}}
For new projects, do not use the legacy names.
DeepSeek API Pricing
DeepSeek pricing is listed per 1M tokens. Prices can change, and DeepSeek explicitly recommends checking the pricing page regularly before topping up or planning production usage.
| Model | Cache-Hit Input / 1M | Cache-Miss Input / 1M | Output / 1M | Notes |
|---|---|---|---|---|
deepseek-v4-flash | $0.0028 | $0.14 | $0.28 | Economical V4 model |
deepseek-v4-pro | $0.003625 | $0.435 | $0.87 | 75% discount listed until May 31, 2026, 15:59 UTC |
The listed deepseek-v4-pro prices are discounted from the crossed-out launch prices, and the input cache-hit price reduction took effect on April 26, 2026, 12:15 UTC.
Cost formula:
cost =
(cache_hit_input_tokens / 1,000,000 × cache_hit_price)
+ (cache_miss_input_tokens / 1,000,000 × cache_miss_price)
+ (output_tokens / 1,000,000 × output_price)
Example using deepseek-v4-flash:
Cache-hit input: 200,000 tokens × $0.0028 / 1M = $0.00056
Cache-miss input: 100,000 tokens × $0.14 / 1M = $0.01400
Output: 50,000 tokens × $0.28 / 1M = $0.01400
Estimated total = $0.02856
Context Caching
Context caching is enabled by default for DeepSeek API users. It stores reusable input prefixes on disk so later requests with matching prefixes can receive cache-hit pricing instead of cache-miss pricing.
This matters for repeated prompts, RAG systems, long documents, multi-turn conversations, and agent workflows. For example, if your app repeatedly sends the same system prompt, product documentation, or retrieved context, DeepSeek may count part of the input as a cache hit.
DeepSeek exposes cache usage through:
prompt_cache_hit_tokens
prompt_cache_miss_tokens
The prompt_tokens value equals cache-hit tokens plus cache-miss tokens.
Best practices:
- Put stable instructions and repeated documents early in the prompt.
- Keep shared RAG context consistent across requests.
- Reuse conversation prefixes when appropriate.
- Monitor hit and miss tokens in logs.
- Do not assume a 100% hit rate; DeepSeek describes caching as best effort.
Thinking Mode
DeepSeek V4 supports thinking mode, where the model can produce reasoning content before the final answer. The toggle defaults to enabled, and the OpenAI-compatible format uses:
{
"thinking": {
"type": "enabled"
}
}
DeepSeek supports reasoning_effort values of high and max; compatibility mappings convert low and medium to high, and xhigh to max.
Important: in thinking mode, temperature, top_p, presence_penalty, and frequency_penalty do not take effect.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "user", "content": "Design a scalable RAG architecture for a legal search app."}
],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
print(response.choices[0].message.content)
Use higher reasoning effort for complex coding, math, planning, and agentic decisions. Avoid it for simple classification, short copy, or high-volume requests where latency and cost matter more.
Streaming Responses
Streaming improves perceived speed by returning partial message deltas as they are generated. DeepSeek’s Chat Completion reference says streamed tokens are sent as server-sent events and the stream ends with data: [DONE].
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Write a short onboarding checklist for API developers."}
],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
For chat UIs, streaming is usually better than waiting for the full response. For batch jobs, non-streaming is simpler.
JSON Output
Use JSON output when your application needs structured, machine-readable responses. DeepSeek supports response_format: {"type": "json_object"} and says the prompt must also include the word “json” plus an example of the desired format.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "system",
"content": 'Return valid json only, for example: {"summary": "...", "tags": ["..."]}'
},
{
"role": "user",
"content": "Summarize this: DeepSeek API supports V4 models, caching, JSON output, and tool calls."
}
],
response_format={"type": "json_object"},
max_tokens=500,
)
data = json.loads(response.choices[0].message.content)
print(data)
Common mistakes include forgetting to ask for JSON in the prompt, setting max_tokens too low, or assuming JSON output means the schema is semantically correct. Always parse and validate the result.
Tool Calls / Function Calling
Tool calls let the model request an external function, such as checking weather, querying a database, searching a knowledge base, or calling an internal service. DeepSeek’s Chat Completion reference says tools currently support functions, with up to 128 functions.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
tools = [
{
"type": "function",
"function": {
"name": "get_order_status",
"description": "Get the current status of a customer order.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID."
}
},
"required": ["order_id"]
}
}
}
]
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Check order A12345."}
],
tools=tools,
tool_choice="auto",
)
message = response.choices[0].message
if message.tool_calls:
for call in message.tool_calls:
args = json.loads(call.function.arguments)
# Validate args before executing real business logic.
print(call.function.name, args)
else:
print(message.content)
DeepSeek warns that generated function arguments may not always be valid JSON and may include hallucinated parameters, so validate arguments before executing functions.
Rate Limits, Errors, and Troubleshooting
DeepSeek dynamically limits user concurrency based on server load. When the concurrency limit is reached, the API returns HTTP 429.
| Code | Meaning | Common Cause | Fix |
|---|---|---|---|
| 400 | Invalid Format | Bad request body | Follow the API format and error hints |
| 401 | Authentication Fails | Wrong API key | Check or recreate the API key |
| 402 | Insufficient Balance | No balance | Top up the account |
| 422 | Invalid Parameters | Unsupported or malformed parameters | Fix request parameters |
| 429 | Rate Limit Reached | Too many requests / concurrency pressure | Pace requests and retry later |
| 500 | Server Error | DeepSeek server issue | Retry after a brief wait |
| 503 | Server Overloaded | High traffic | Retry after a brief wait |
These error definitions come from the official Error Codes page.
Production tips:
- Add exponential backoff for 429, 500, and 503.
- Set request timeouts.
- Monitor
usagefields. - Log model IDs and finish reasons.
- Check DeepSeek Service Status during incidents; the status page shows API and web chat availability.
DeepSeek API vs OpenAI API Compatibility
“OpenAI-compatible” means you can often keep your existing OpenAI SDK and change three things:
- Base URL
- API key
- Model name
For example:
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
Then use:
model="deepseek-v4-flash"
or:
model="deepseek-v4-pro"
However, compatibility does not mean identical behavior. Model outputs, latency, supported parameters, tool-call behavior, JSON reliability, context handling, and cost structure should all be tested before a full migration. Also note that in thinking mode some sampling parameters do not take effect.
Best Practices for Production Apps
Use this checklist before shipping:
- Never expose DeepSeek API keys in frontend code.
- Send API requests from a backend or serverless function.
- Use environment variables for secrets.
- Prefer
deepseek-v4-flashfor high-volume tasks. - Escalate to
deepseek-v4-profor complex reasoning. - Cap
max_tokens. - Track input, output, cache-hit, and cache-miss tokens.
- Use context caching intentionally.
- Validate JSON output.
- Validate tool-call arguments before execution.
- Add retries for 429, 500, and 503.
- Monitor DeepSeek pricing, model availability, and status updates.
FAQ
What is the DeepSeek API?
The DeepSeek API is a developer interface for accessing DeepSeek language models through OpenAI-compatible and Anthropic-compatible API formats. It supports chat completions, streaming, JSON output, tool calls, thinking mode, and context caching.
Where are the official DeepSeek API Docs?
The official DeepSeek API Docs are hosted by DeepSeek and include quickstart guides, model pricing, API reference pages, guide pages, news, FAQs, and a change log.
How do I get a DeepSeek API key?
Create a DeepSeek Platform account, then generate an API key from the platform. The official quickstart links API key creation from the first-call setup page.
Is DeepSeek API compatible with the OpenAI SDK?
Yes. DeepSeek says its API uses a format compatible with OpenAI and can be accessed with the OpenAI SDK by changing the base URL and model.
What is the DeepSeek API base URL?
The OpenAI-compatible base URL is https://api.deepseek.com. The Anthropic-compatible base URL is https://api.deepseek.com/anthropic.
Which DeepSeek model should I use?
Use deepseek-v4-flash for cost-efficient, high-volume work. Use deepseek-v4-pro for complex reasoning, coding, and agentic workflows where output quality is more important than cost.
How much does DeepSeek API cost?
As of May 3, 2026, deepseek-v4-flash is listed at $0.0028 per 1M cache-hit input tokens, $0.14 per 1M cache-miss input tokens, and $0.28 per 1M output tokens. deepseek-v4-pro is listed at discounted rates of $0.003625, $0.435, and $0.87 respectively. Always verify current pricing before production use.
What is the difference between deepseek-v4-flash and deepseek-v4-pro?
deepseek-v4-flash is the lower-cost, faster model for general and high-volume workloads. deepseek-v4-pro is the more capable model for difficult reasoning, coding, and agent workflows.
Are deepseek-chat and deepseek-reasoner still supported?
They are legacy names and are scheduled to be retired on July 24, 2026. For compatibility, they currently map to deepseek-v4-flash non-thinking and thinking modes respectively.
Can I use DeepSeek API with Node.js?
Yes. Install the OpenAI Node.js SDK, set baseURL to https://api.deepseek.com, and pass your DeepSeek API key.
Can I use DeepSeek API with Python?
Yes. Install the OpenAI Python SDK, initialize OpenAI(base_url="https://api.deepseek.com"), and call client.chat.completions.create() with a DeepSeek model.
How do I fix a DeepSeek API 429 error?
A 429 means rate limit reached. Pace requests, add retries with backoff, reduce concurrency, and consider fallback providers for critical workloads.
Does DeepSeek API support JSON output?
Yes. Set response_format to {"type": "json_object"} and explicitly ask for JSON in the prompt.
Does DeepSeek API support tool calling?
Yes. DeepSeek supports function tools in chat completions. Validate all generated function arguments before running external actions.
Is DeepSeek API safe for production?
It can be used in production if you secure API keys, handle errors, monitor token usage, validate outputs, use retries, and keep pricing/model information updated.
Conclusion
The fastest path to using the DeepSeek API is simple: create an API key, use the OpenAI SDK, set the DeepSeek base URL, choose either deepseek-v4-flash or deepseek-v4-pro, and monitor token usage carefully. For production applications, pay close attention to context caching, output validation, tool-call safety, and the upcoming retirement of legacy model names.





