DeepSeek API Guide explains the current official DeepSeek API surface for developers. As of April 26, 2026, DeepSeek’s official API uses the current V4 model IDs deepseek-v4-flash and deepseek-v4-pro. The API supports an OpenAI-compatible chat-completions format at https://api.deepseek.com and an Anthropic-compatible format at https://api.deepseek.com/anthropic.
Last verified against official DeepSeek public documentation: April 26, 2026.
V4 update: DeepSeek-V4 Preview is now live and available through DeepSeek Chat and the API. DeepSeek states that
deepseek-chatanddeepseek-reasonerare legacy compatibility names that currently route todeepseek-v4-flashnon-thinking and thinking modes, and will be retired after July 24, 2026, 15:59 UTC. New integrations should usedeepseek-v4-flashordeepseek-v4-pro.
Quickstart (5–10 Minutes)
The DeepSeek API uses an API format compatible with OpenAI and Anthropic SDK ecosystems. For OpenAI-style chat completions, set the base URL to https://api.deepseek.com. For Anthropic-style integrations, use https://api.deepseek.com/anthropic.
Important: Use the official DeepSeek API docs as the source of truth for current behavior. Beta features such as Chat Prefix Completion and strict tool schemas may require
https://api.deepseek.com/betainstead of the default OpenAI-format base URL.
Current Official API Snapshot
- OpenAI-format base URL:
https://api.deepseek.com - Anthropic-format base URL:
https://api.deepseek.com/anthropic - Current API model IDs:
deepseek-v4-flashanddeepseek-v4-pro - Legacy compatibility names:
deepseek-chatanddeepseek-reasoner, scheduled for retirement after July 24, 2026 - Current model family: DeepSeek-V4 Preview
- Context length: 1M tokens
- Maximum output: 384K tokens
- Thinking mode: Both V4 API models support thinking and non-thinking modes; thinking mode is enabled by default unless disabled.
- Core features: JSON Output, Tool Calls, Chat Prefix Completion (Beta), and FIM Completion (Beta). FIM is available in non-thinking mode only.
- Official pricing source: Use the official DeepSeek Models & Pricing page for the latest public API prices.
To begin, create an API key on the official DeepSeek Platform and store it securely in an environment variable. DeepSeek bills usage against account balance, and the API also offers GET /models and GET /user/balance to sanity-check your integration before shipping.
For current token rates, cache-hit/cache-miss billing, promotions, and deduction rules, use the official DeepSeek Models & Pricing page. DeepSeek states that product prices may vary, so the official pricing page should be treated as the source of truth.
Basic steps:
- Set the base URL: Use
https://api.deepseek.comfor normal OpenAI-format production requests. - Authenticate: Send your API key as a Bearer token in the
Authorizationheader. - Call the main endpoint: For chat interactions, use
POST /chat/completions. - Provide the required body: At minimum, send a valid
modelandmessagesarray. - Choose the mode: Use
thinking: {"type": "enabled"}for reasoning orthinking: {"type": "disabled"}for non-thinking responses. - Add optional controls only when needed: Common additions include
stream,reasoning_effort,response_format,tools, andtool_choice.
Minimal cURL Request
export DEEPSEEK_API_KEY="sk-YourDeepSeekAPIKey"
curl https://api.deepseek.com/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
-d '{
"model": "deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"thinking": {"type": "disabled"},
"stream": false
}'
This example uses deepseek-v4-flash in non-thinking mode for a fast everyday chat response. For more difficult reasoning, coding, or agentic workflows, switch to deepseek-v4-pro and enable thinking mode.
Minimal Python Request
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the DeepSeek API in one paragraph."},
],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
stream=False,
)
print(response.choices[0].message.content)
Tip: When using the OpenAI SDK, pass DeepSeek-specific body fields such as
thinkingthroughextra_body. The official quickstart showsdeepseek-v4-prowithreasoning_effort="high"andthinkingenabled for reasoning-oriented examples.
Optional Sanity Checks
# List currently available model IDs
curl https://api.deepseek.com/models \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}"
# Check whether your balance is available for API calls
curl https://api.deepseek.com/user/balance \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}"
The official /models response currently lists deepseek-v4-flash and deepseek-v4-pro. Before you scale usage, estimate token volume, monitor cache-hit and cache-miss usage, and verify current rates on the official DeepSeek Models & Pricing page.
Authentication
Every DeepSeek API request must include an Authorization: Bearer YOUR_API_KEY header. Replace YOUR_API_KEY with the secret key created in your DeepSeek Platform dashboard.
Authorization: Bearer YOUR_API_KEY
DeepSeek’s API reference describes the authentication scheme as HTTP Bearer Auth. The quick-start error list separates authentication problems from billing problems: a wrong or missing key leads to 401, while exhausted balance leads to 402.
Common Authentication and Billing Issues
- 401 Authentication Fails: The key is missing, malformed, incorrect, or no longer valid.
- 402 Insufficient Balance: Your account balance is not sufficient for the request.
- Rotated or replaced key: After generating a new key, make sure your app actually uses the updated secret.
- Client-side exposure: Never expose a live API key in browser JavaScript, mobile binaries, public Git repositories, or screenshots.
If requests suddenly stop working, verify the exact Bearer header first, then confirm your platform balance. Those two checks solve many first-run integration issues.
Chat Completions
The primary DeepSeek endpoint for conversational generation is POST https://api.deepseek.com/chat/completions. This endpoint accepts a list of messages and returns the model’s next assistant response.
Endpoint and Required Fields
- URL:
POST https://api.deepseek.com/chat/completions - Required field 1 —
model: Usedeepseek-v4-flashordeepseek-v4-pro. - Required field 2 —
messages: A non-empty array describing the conversation so far.
Supported message roles:
system: Global instructions for behavior, style, policy, or output format.user: The user request.assistant: Prior model replies that you want to keep in context.tool: Tool results returned by your application after the model requests a tool call.
The assistant message schema also supports advanced fields such as prefix and reasoning_content for Beta prefix-completion and thinking-mode workflows. Those fields are not required for normal chat.
Core Parameters
model: Must bedeepseek-v4-flashordeepseek-v4-profor new integrations.messages: The conversation array.thinking: Controls thinking mode with{"type": "enabled"}or{"type": "disabled"}. The default isenabled.reasoning_effort: In thinking mode, usehighormax. DeepSeek mapslowandmediumtohigh, andxhightomaxfor compatibility.max_tokens: Maximum number of output tokens. Input plus output remains bounded by the model context length.temperature: Default1.0. DeepSeek’s guidance suggests0.0for coding/math,1.0for data cleaning/analysis,1.3for general conversation and translation, and1.5for creative writing.top_p: Nucleus sampling alternative totemperature. Use one or the other in most cases, not both.presence_penalty/frequency_penalty: Repetition-control parameters in the range-2.0to2.0.stop: Up to 16 stop sequences.stream: Enables Server-Sent Events streaming.stream_options: Supportsinclude_usagewhenstream=true.response_format: Set{"type": "json_object"}to enable JSON Output.tools/tool_choice: Enables Tool Calls. DeepSeek currently supports function tools and up to 128 functions.logprobs/top_logprobs: Optional token-probability outputs for supported non-thinking requests.
Thinking mode caveat: In thinking mode,
temperature,top_p,presence_penalty, andfrequency_penaltyare accepted for compatibility but have no effect. Treat sampling controls as non-thinking-mode controls.
Model Selection — V4-Flash vs V4-Pro
DeepSeek currently offers two V4 API model IDs. Both support a 1M context window, thinking and non-thinking modes, JSON Output, Tool Calls, and Chat Prefix Completion (Beta). The practical difference is faster, more economical everyday usage versus stronger capability for harder work. For current API prices, always use the official pricing source linked in the table.
| Attribute | deepseek-v4-flash | deepseek-v4-pro |
|---|---|---|
| Best for | Everyday chat, low-latency tasks, cost-sensitive apps, routine coding, extraction, summarization | Hard reasoning, agentic coding, complex analysis, long-context workflows, high-stakes production tasks |
| Model version | DeepSeek-V4-Flash | DeepSeek-V4-Pro |
| Parameter note from release | 284B total / 13B active parameters | 1.6T total / 49B active parameters |
| Context length | 1M | 1M |
| Maximum output | 384K | 384K |
| Thinking mode | Supported; enabled by default unless disabled | Supported; enabled by default unless disabled |
| JSON Output | Yes | Yes |
| Tool Calls | Yes | Yes |
| Chat Prefix Completion (Beta) | Yes | Yes |
| FIM Completion (Beta) | Non-thinking mode only | Non-thinking mode only |
| Pricing source | Official DeepSeek Models & Pricing | Official DeepSeek Models & Pricing |
Legacy Names and Migration
DeepSeek keeps deepseek-chat and deepseek-reasoner as legacy compatibility aliases during the V4 transition. These aliases currently route to DeepSeek V4 models but are not the primary model IDs for current API integrations.
| Legacy name | Current behavior | Current equivalent |
|---|---|---|
deepseek-chat | Routes to deepseek-v4-flash in non-thinking mode | deepseek-v4-flash with thinking: {"type": "disabled"} |
deepseek-reasoner | Routes to deepseek-v4-flash in thinking mode | deepseek-v4-flash or deepseek-v4-pro with thinking: {"type": "enabled"} |
To use the current DeepSeek API models, update the model value to deepseek-v4-flash or deepseek-v4-pro while keeping the same API base URL and endpoint. Use the thinking parameter when you need to explicitly control reasoning behavior.
Thinking vs Non-Thinking Examples
# Non-thinking mode for fast everyday responses
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Summarize this in three bullets."}],
extra_body={"thinking": {"type": "disabled"}},
)
# Thinking mode for harder reasoning
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Analyze this architecture tradeoff."}],
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
Using the Response Correctly
In standard requests, your main answer is usually response.choices[0].message.content. With thinking mode enabled, the same message may also include reasoning_content before the final answer.
Important implementation rule: In ordinary multi-turn chat without tool calls, you can keep the assistant’s final content in history and do not need to carry old reasoning_content forward. In a thinking-mode tool-call loop for the same question, DeepSeek’s current docs require you to pass the assistant message, including reasoning_content, back to the API so the model can continue reasoning across sub-turns.
Example Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1718345013,
"model": "deepseek-v4-pro",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?",
"reasoning_content": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 17,
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 17,
"completion_tokens": 9,
"completion_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 26
}
}
DeepSeek’s official schema includes prompt_cache_hit_tokens and prompt_cache_miss_tokens so you can track caching benefits, and completion_tokens_details.reasoning_tokens so thinking-heavy generations can be inspected more precisely.
Official finish_reason values currently include stop, length, content_filter, tool_calls, and insufficient_system_resource.
JSON Output
To request structured JSON, set response_format={"type": "json_object"}. DeepSeek’s official JSON Output guide adds practical rules: include the word “json” in the prompt, show the model the schema or example you want, and set max_tokens high enough to avoid truncation.
import json
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "system",
"content": "Return the answer in json with keys: answer and confidence."
},
{"role": "user", "content": "What is the capital of Egypt?"}
],
response_format={"type": "json_object"},
extra_body={"thinking": {"type": "disabled"}},
)
print(json.loads(response.choices[0].message.content))
Without a clear JSON instruction in the prompt, the API can appear stuck because the model may continue emitting whitespace until it reaches the token limit. DeepSeek also notes that JSON Output may occasionally return empty content, so production code should validate and retry safely.
Tool Calls
DeepSeek uses the term Tool Calls for structured function invocation. The model can decide whether to call a tool, return natural language, or continue a multi-step tool loop. The model proposes the tool call, but your application executes the function and sends the result back as a tool message.
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "How is the weather in Hangzhou?"}],
tools=tools,
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
If you need strict schema compliance, DeepSeek documents a strict mode (Beta). To use it, switch to base_url="https://api.deepseek.com/beta", set strict: true on each function, and follow DeepSeek’s supported JSON Schema subset. In strict mode, every object property must be listed in required, and additionalProperties should be false.
Context Caching
DeepSeek Context Caching on Disk technology is enabled by default for all users. If later requests share an overlapping prefix with earlier requests, the repeated prefix can count as a cache hit. Cache-hit and cache-miss input are billed differently, so verify the current rates on the official DeepSeek Models & Pricing page.
- What can hit the cache: repeated prefixes such as the same system prompt, the same long document prefix, or repeated few-shot examples.
- How to inspect it: check
usage.prompt_cache_hit_tokensandusage.prompt_cache_miss_tokensin the response. - Important limit: the cache works on a best-effort basis and does not guarantee a 100% hit rate.
- Practical pattern: put stable instructions and reusable context at the beginning of the message sequence so repeated prefixes are easier to reuse.
Streaming
Set "stream": true to receive data-only Server-Sent Events (SSE) as the model generates output. A streaming response ends with data: [DONE].
If you set stream_options={"include_usage": true}, DeepSeek sends one extra chunk before [DONE] where choices is empty and usage contains totals for the full request.
Custom Parser Example
import json
import requests
resp = requests.post(url, headers=headers, json=payload, stream=True, timeout=(10, 300))
resp.raise_for_status()
for raw_line in resp.iter_lines(decode_unicode=True):
if not raw_line:
continue
if raw_line.startswith(":"):
# Ignore SSE keep-alive comments
continue
if raw_line.startswith("data: "):
data = raw_line[len("data: "):].strip()
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0].get("delta", {})
if delta.get("reasoning_content"):
print(delta["reasoning_content"], end="", flush=True)
elif delta.get("content"):
print(delta["content"], end="", flush=True)
In thinking mode, streamed chunks may contain delta.reasoning_content before final delta.content. Parse them separately if you need to inspect reasoning output distinctly from the user-facing answer.
Keep-Alive Behavior and Timeouts
DeepSeek’s rate-limit documentation states that under scheduling pressure:
- Non-streaming requests may return empty lines while waiting.
- Streaming requests may return
: keep-alivecomments while waiting. - If inference has not started after 10 minutes, the server closes the connection.
Use explicit connect/read timeouts in production, and make sure your reverse proxies, serverless runtime, or gateway layer do not kill long-running streamed responses too early.
Rate Limits & Retries
DeepSeek currently describes API rate limiting as a dynamic concurrency limit based on server load. When you reach the concurrency limit, the API immediately returns HTTP 429. The FAQ also says the exposed limit on each account is adjusted dynamically according to real-time traffic pressure and short-term historical usage.
In practice, moderate usage usually works without manual tuning, but aggressive bursts can still produce 429 responses and long waits during busy periods. DeepSeek also says it does not currently raise the dynamic limit for individual accounts and does not offer tiered plans that unlock a higher fixed cap.
Recommended Retry Pattern
- Retry:
429,500, and503 - Do not blindly retry unchanged:
400,401,402, and422 - Use exponential backoff with jitter: for example 1s, 2s, 4s, 8s with a small random component
for attempt in range(1, 6):
try:
return call_deepseek()
except RetryableError:
sleep = min(2 ** (attempt - 1), 16) + random_jitter()
time.sleep(sleep)
except FatalRequestError:
raise
If failures look widespread rather than request-specific, check the official DeepSeek Service Status page before you keep retrying.
Error Codes & Troubleshooting
The current official DeepSeek quick-start error list includes the following API-facing codes:
| Code | Official meaning | What to do |
|---|---|---|
| 400 | Invalid Format | Fix the request body according to the error message and official schema. |
| 401 | Authentication Fails | Check the API key and Bearer header. |
| 402 | Insufficient Balance | Top up the account or verify available balance. |
| 422 | Invalid Parameters | Correct unsupported or malformed parameter values. |
| 429 | Rate Limit Reached | Slow down, back off, and retry later. |
| 500 | Server Error | Retry after a brief wait. |
| 503 | Server Overloaded | Retry after a brief wait and check status if it persists. |
This guide deliberately uses the official current error list above. For day-to-day development you may still see generic HTTP behaviors caused by proxy, DNS, or URL path mistakes, but those are not part of DeepSeek’s current documented quick-start error table.
Anthropic API Format
DeepSeek also supports the Anthropic API ecosystem through https://api.deepseek.com/anthropic. This is useful for tools and coding agents that expect Anthropic-style messages and environment variables.
export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_API_KEY=${DEEPSEEK_API_KEY}
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="deepseek-v4-pro",
max_tokens=1000,
system="You are a helpful assistant.",
messages=[
{
"role": "user",
"content": [{"type": "text", "text": "Hi, how are you?"}],
}
],
)
print(message.content)
DeepSeek’s Anthropic API compatibility page notes that unsupported model names in the Anthropic API backend are automatically mapped to deepseek-v4-flash. For predictable production behavior, set deepseek-v4-flash or deepseek-v4-pro explicitly.
Security & Production Notes
- Keep API keys server-side: Do not expose a live DeepSeek key in browser JavaScript or untrusted mobile code.
- Minimize sensitive data: Send only the user content you actually need for the task, and redact personal or regulated data where possible.
- Validate tool-call arguments: The model may output malformed or unsafe arguments. Validate before executing any function.
- Use explicit timeouts and retries: DeepSeek requests can remain open while the platform waits for inference scheduling.
- Watch balance and usage: The platform supports billing checks and usage exports by API key according to the current FAQ.
- Separate hosted API from self-hosting: DeepSeek V4 is open-sourced, but self-hosting is a different deployment path from the official hosted API documented here.
- Avoid legacy model IDs: Update examples, dashboards, calculators, and SDK wrappers from
deepseek-chat/deepseek-reasonertodeepseek-v4-flash/deepseek-v4-pro.
Note: This guide is provided by Chat-Deep.ai as an independent reference. It summarizes the official DeepSeek API documentation, but it is not the official DeepSeek documentation itself. For production decisions, verify model names, pricing, limits, and endpoint behavior against official DeepSeek sources.





