Quick answer: To use DeepSeek with Python, install the OpenAI Python SDK, create an OpenAI client with a DeepSeek API key, set base_url="https://api.deepseek.com", and call the current DeepSeek V4 model IDs: deepseek-v4-flash or deepseek-v4-pro.
Use deepseek-v4-flash for fast, economical chat, summaries, extraction, JSON Output, routine coding help, classification, and high-volume workloads. Use deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, agentic workflows, and higher-value production tasks.
Independent note: Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, the official DeepSeek app, the official DeepSeek developer platform, OpenAI, or the OpenAI Python SDK.
Last verified: April 24, 2026.
Current DeepSeek API snapshot
- Current API model IDs:
deepseek-v4-flashanddeepseek-v4-pro- Current API generation: DeepSeek-V4 Preview
- Base URL for OpenAI-compatible requests:
https://api.deepseek.com- Context length: 1M tokens
- Maximum output: 384K tokens
- Thinking mode: supported on both current V4 API models
- Non-thinking mode: supported on both current V4 API models
- JSON Output: supported on both current V4 API models
- Tool Calls: supported on both current V4 API models
- FIM Completion: supported in non-thinking mode only
- Legacy aliases:
deepseek-chatanddeepseek-reasonercurrently route todeepseek-v4-flashnon-thinking and thinking modes- Legacy alias retirement: DeepSeek says
deepseek-chatanddeepseek-reasonerwill be retired after July 24, 2026, 15:59 UTC
This page is updated to match the current Chat-Deep.ai homepage, DeepSeek API guide, DeepSeek API pricing guide, OpenAI SDK with DeepSeek guide, Node.js TypeScript guide, Thinking Mode guide, Tool Calls guide, and Token Usage guide.
Table of contents
- Quick answer: how to use DeepSeek with Python
- Who this guide is for
- What is the DeepSeek Python SDK?
- Install the Python package
- Set your DeepSeek API key safely
- DeepSeek
base_urlexplained - Choosing a current V4 model
- Minimal DeepSeek Python example
- Non-thinking mode example
- Thinking Mode in Python
- Streaming responses in Python
- Async DeepSeek Python example
- Multi-turn chat in Python
- DeepSeek JSON Output in Python
- DeepSeek Tool Calls in Python
- Strict mode for Tool Calls
- Chat Completions vs OpenAI Responses API
- Error handling, retries, and timeouts
- Token usage, context caching, and cost
- List models and check balance
- Legacy aliases: deepseek-chat and deepseek-reasoner
- Common mistakes
- Production checklist
- When this guide is not the right page
- FAQ
- Official sources
Quick answer: how to use DeepSeek with Python
The current official DeepSeek quick start uses an OpenAI-compatible API format. In Python, the documented path is to use the OpenAI Python SDK with DeepSeek-specific configuration.
- Install the OpenAI Python package:
pip install openai. - Store a DeepSeek API key in a secure environment variable such as
DEEPSEEK_API_KEY. - Create an
OpenAIclient withbase_url="https://api.deepseek.com". - Call
client.chat.completions.create(). - Use
model="deepseek-v4-flash"for most fast and economical workflows. - Use
model="deepseek-v4-pro"for harder reasoning, coding, long-context, or agentic workflows. - Set
extra_body={"thinking": {"type": "disabled"}}for non-thinking routes andextra_body={"thinking": {"type": "enabled"}}for thinking routes.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise Python assistant."},
{"role": "user", "content": "Explain DeepSeek Python setup in one paragraph."},
],
stream=False,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
Important: DeepSeek’s official Python examples use Chat Completions through client.chat.completions.create(). Do not assume OpenAI’s client.responses.create() examples work with DeepSeek unless DeepSeek officially documents that endpoint for your use case.
Who this guide is for
- Use this guide if you want to call DeepSeek from Python using the OpenAI Python SDK.
- Use the OpenAI SDK with DeepSeek guide if you need both Python and Node.js migration details.
- Use the DeepSeek Node.js TypeScript guide if your project is in JavaScript or TypeScript.
- Use the DeepSeek API guide if you need a broader API overview, model selection, pricing links, and endpoint notes.
- Use the Chat-Deep.ai browser chat if you only want to test prompts quickly without writing code.
What is the DeepSeek Python SDK?
Developers often search for “DeepSeek Python SDK,” but the officially documented Python path is to use the openai Python package configured with DeepSeek’s base URL and a DeepSeek API key.
In this article, DeepSeek Python SDK means: the OpenAI Python client, a DeepSeek API key, base_url="https://api.deepseek.com", and a current DeepSeek V4 model ID. Do not install or recommend an unofficial package such as pip install deepseek unless official DeepSeek documentation explicitly documents it for your use case.
This page intentionally focuses on Python. It does not replace dedicated pages for DeepSeek JSON Output, DeepSeek Tool Calls, DeepSeek Thinking Mode, DeepSeek Token Usage, or DeepSeek API Pricing.
Install the Python package
Install the OpenAI Python package from PyPI:
pip install openai
If you want to load variables from a local .env file during development, you can also install python-dotenv:
pip install python-dotenv
The OpenAI Python package provides synchronous and asynchronous clients. In this article, normal examples use OpenAI, and the async example uses AsyncOpenAI.
Set your DeepSeek API key safely
Keep your DeepSeek API key server-side. Never put a live API key in browser JavaScript, public mobile code, public GitHub repositories, logs, screenshots, support tickets, or client-side bundles.
macOS or Linux
export DEEPSEEK_API_KEY="<your_deepseek_api_key>"
Windows PowerShell
[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "<your_deepseek_api_key>", "User")
Optional local .env file
DEEPSEEK_API_KEY=<your_deepseek_api_key>
If you use a .env file, add it to .gitignore and never commit it:
from dotenv import load_dotenv
load_dotenv()
Then your application can read the key with os.environ["DEEPSEEK_API_KEY"].
DeepSeek base_url explained
The base_url tells the OpenAI Python client to send requests to DeepSeek instead of the default OpenAI endpoint. In Python, use base_url, not baseURL. The baseURL spelling is used in JavaScript and TypeScript examples.
base_url | When to use it | Important note |
|---|---|---|
https://api.deepseek.com | Recommended default for normal DeepSeek API requests | Use this for Chat Completions, streaming, JSON Output, Tool Calls, thinking mode, token usage, and most production apps. |
https://api.deepseek.com/v1 | OpenAI compatibility path | The /v1 path is for compatibility and is not a DeepSeek model version. |
https://api.deepseek.com/beta | Only for documented beta features | Use only where official docs require it, such as strict tool schemas, Chat Prefix Completion, or FIM Completion. |
For normal Python usage, start with https://api.deepseek.com. Switch to /beta only when a documented beta feature explicitly requires it.
Choosing a current V4 model
For new Python code, use deepseek-v4-flash or deepseek-v4-pro. Do not present deepseek-chat or deepseek-reasoner as the primary current model IDs.
| Model | Use it for | Recommended mode | Notes |
|---|---|---|---|
deepseek-v4-flash | Fast chat, summaries, extraction, JSON Output, routine coding help, classification, support bots, and cost-sensitive Python applications | Usually start with non-thinking mode | Best first model for most Python integrations and migration tests. |
deepseek-v4-pro | Hard reasoning, complex coding, long-context analysis, tool planning, agentic workflows, and high-value production tasks | Usually use thinking mode for difficult tasks | Use when quality and reasoning matter more than lowest token price. |
deepseek-chat | Legacy compatibility only | Routes to V4-Flash non-thinking mode | Scheduled for retirement after July 24, 2026, 15:59 UTC. |
deepseek-reasoner | Legacy compatibility only | Routes to V4-Flash thinking mode | Scheduled for retirement after July 24, 2026, 15:59 UTC. |
Minimal DeepSeek Python example
This is the simplest copy-paste DeepSeek API Python example using the OpenAI-compatible client and the current lower-cost V4 model.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a helpful Python assistant."},
{"role": "user", "content": "Explain DeepSeek Python setup in one paragraph."},
],
stream=False,
extra_body={"thinking": {"type": "disabled"}},
)
message = response.choices[0].message
print(message.content)
The required request fields are model and messages. The messages array contains the conversation so far, and the assistant’s generated answer is returned in response.choices[0].message.content for normal chat completions.
For endpoint-level details, read our Create a DeepSeek Chat Completion guide or the official DeepSeek Chat Completion API reference.
Non-thinking mode example with deepseek-v4-flash
Non-thinking mode is usually the better default for fast, simple, structured, or cost-sensitive routes.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise support assistant."},
{"role": "user", "content": "Summarize this ticket in three bullet points."},
],
max_tokens=600,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
Use this pattern for ordinary chat, extraction, short summaries, classification, JSON-only tasks, and routine coding help where you do not need explicit reasoning behavior.
Thinking Mode in Python
DeepSeek V4 supports both thinking and non-thinking modes. Thinking mode can improve hard reasoning, complex coding, tool planning, long-context analysis, and agentic workflows. In Python with the OpenAI SDK, pass the thinking object through extra_body.
Thinking mode with deepseek-v4-pro
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "user", "content": "Compare two Python API retry strategies and explain the tradeoffs."}
],
reasoning_effort="high",
max_tokens=2000,
extra_body={"thinking": {"type": "enabled"}},
)
message = response.choices[0].message
reasoning = getattr(message, "reasoning_content", None)
if reasoning:
# Keep reasoning separate from end-user output.
# Store, inspect, or discard it according to your product policy.
pass
print("Final answer:")
print(message.content)
Thinking mode with deepseek-v4-flash
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Plan a robust Python error-handling strategy for an API client."}
],
reasoning_effort="high",
max_tokens=1500,
extra_body={"thinking": {"type": "enabled"}},
)
print(response.choices[0].message.content)
In thinking mode, reasoning_content is separate from the final content. The final user-facing answer is in content. For normal multi-turn chat without tool calls, you can carry the final answer forward. During thinking-mode tool-call loops, preserve the full assistant message because it may include reasoning_content and tool_calls.
For deeper thinking-mode behavior, read our DeepSeek Thinking Mode guide and the official DeepSeek Thinking Mode documentation.
Streaming responses in Python
Set stream=True to receive partial response chunks as they are generated. If you need final token accounting during streaming, set stream_options={"include_usage": True} and capture the final usage-bearing chunk.
Non-thinking streaming example
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise Python assistant."},
{"role": "user", "content": "Give me three Python tips for reliable API integrations."},
],
stream=True,
stream_options={"include_usage": True},
extra_body={"thinking": {"type": "disabled"}},
)
final_usage = None
for chunk in stream:
if getattr(chunk, "usage", None) is not None:
final_usage = chunk.usage
continue
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if getattr(delta, "content", None):
print(delta.content, end="", flush=True)
if final_usage is not None:
print()
print("Usage:", final_usage)
Thinking-mode streaming example
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
stream = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "user", "content": "Explain why robust retry logic matters in Python API clients."}
],
stream=True,
stream_options={"include_usage": True},
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
reasoning_buffer = []
answer_buffer = []
final_usage = None
for chunk in stream:
if getattr(chunk, "usage", None) is not None:
final_usage = chunk.usage
continue
if not chunk.choices:
continue
delta = chunk.choices[0].delta
reasoning = getattr(delta, "reasoning_content", None)
if reasoning:
reasoning_buffer.append(reasoning)
content = getattr(delta, "content", None)
if content:
answer_buffer.append(content)
print(content, end="", flush=True)
final_answer = "".join(answer_buffer)
if final_usage is not None:
print()
print("Usage:", final_usage)
Most user-facing apps should display only final answer content unless they have a clear product policy for showing reasoning output. Keep reasoning and final answer buffers separate.
Async DeepSeek Python example
For high-concurrency Python applications, use AsyncOpenAI. The request shape is almost the same, but you call it with await.
import asyncio
import os
from openai import AsyncOpenAI
client = AsyncOpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
async def main() -> None:
response = await client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a helpful Python assistant."},
{"role": "user", "content": "Explain async API clients in two sentences."},
],
stream=False,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
asyncio.run(main())
Use async when your server needs to handle many simultaneous API calls without blocking worker threads. Keep the same security, timeout, validation, and retry rules you use in synchronous code.
Multi-turn chat in Python
DeepSeek’s Chat Completions API is stateless. The server does not remember earlier turns for you. If you want multi-turn chat, your application must keep the conversation history and send it again with each request.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
messages = [
{"role": "system", "content": "You are a helpful Python assistant."},
{"role": "user", "content": "What is context caching in one sentence?"},
]
first_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
extra_body={"thinking": {"type": "disabled"}},
)
first_message = first_response.choices[0].message
print(first_message.content)
messages.append({
"role": "assistant",
"content": first_message.content,
})
messages.append({
"role": "user",
"content": "Now explain why repeated prefixes can matter for cost.",
})
second_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
extra_body={"thinking": {"type": "disabled"}},
)
print(second_response.choices[0].message.content)
For ordinary multi-turn chat, store and resend the assistant’s final content. For thinking-mode tool-call loops, preserve the full assistant message during the active loop because it may include reasoning_content and tool_calls.
For a deeper explanation, see the official DeepSeek Multi-round Conversation guide.
DeepSeek JSON Output in Python
DeepSeek JSON Output uses response_format={"type": "json_object"}. The prompt should explicitly include the word “json,” provide an example shape, and set max_tokens high enough to reduce truncation risk. Production code should parse and validate the result before using it.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{
"role": "system",
"content": (
"Return only valid json with keys: "
"title, summary, category, confidence. "
"Do not include Markdown or explanations."
),
},
{
"role": "user",
"content": "Classify this text: DeepSeek Python SDK setup is simple.",
},
],
response_format={"type": "json_object"},
max_tokens=500,
extra_body={"thinking": {"type": "disabled"}},
)
choice = response.choices[0]
content = choice.message.content or ""
if choice.finish_reason == "length":
raise RuntimeError("The JSON may be truncated. Increase max_tokens.")
if not content.strip():
raise RuntimeError("The response content was empty. Retry with a clearer json prompt.")
try:
data = json.loads(content)
except json.JSONDecodeError as exc:
raise RuntimeError("The response was not valid JSON.") from exc
required = {"title", "summary", "category", "confidence"}
missing = required - set(data)
if missing:
raise RuntimeError(f"Missing required keys: {missing}")
print(data)
DeepSeek’s official JSON Output documentation notes that empty content can occasionally occur. For production coding tools, add retry logic, validate parsed JSON, and show a safe fallback instead of assuming every response will be parseable.
For a full structured-output implementation guide, read our DeepSeek JSON Output guide and the official DeepSeek JSON Output documentation.
DeepSeek Tool Calls in Python
DeepSeek Tool Calls let the model request structured function calls. The model does not execute your function automatically. Your application validates the arguments, executes the function, appends a tool message with the matching tool_call_id, and sends another request.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
def get_weather(location: str) -> dict:
# Replace this with your real weather service.
return {
"location": location,
"temperature_c": 26,
"condition": "clear",
}
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city or region.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country, for example Cairo, Egypt",
}
},
"required": ["location"],
},
},
}
]
messages = [
{"role": "user", "content": "What is the weather in Cairo, Egypt?"}
]
first = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=tools,
tool_choice="auto",
extra_body={"thinking": {"type": "disabled"}},
)
message = first.choices[0].message
if message.tool_calls:
messages.append(message.model_dump(exclude_none=True))
for tool_call in message.tool_calls:
if tool_call.function.name != "get_weather":
raise RuntimeError(f"Unexpected tool call: {tool_call.function.name}")
try:
arguments = json.loads(tool_call.function.arguments)
except json.JSONDecodeError as exc:
raise RuntimeError("Tool arguments were not valid JSON.") from exc
location = arguments.get("location")
if not isinstance(location, str) or not location.strip():
raise RuntimeError("Tool argument 'location' must be a non-empty string.")
result = get_weather(location)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False),
})
second = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=tools,
extra_body={"thinking": {"type": "disabled"}},
)
print(second.choices[0].message.content)
else:
print(message.content)
Validate tool arguments before execution, especially if a tool touches databases, payments, accounts, files, shell commands, repositories, or external APIs. For a dedicated implementation guide, read our DeepSeek Tool Calls article and the official DeepSeek Tool Calls documentation.
Thinking-mode Tool Calls in Python
For thinking-mode tool workflows, use deepseek-v4-pro when the task needs stronger planning or reasoning. During a thinking + tool-call loop, preserve the full assistant message so the model can continue the same reasoning process after tool results.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
def lookup_order_status(order_id: str) -> str:
return json.dumps({
"order_id": order_id,
"status": "shipped",
"eta": "2026-04-27",
})
tools = [
{
"type": "function",
"function": {
"name": "lookup_order_status",
"description": "Look up shipping status for an order ID.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The user's order ID.",
}
},
"required": ["order_id"],
},
},
}
]
messages = [
{"role": "user", "content": "Check order A123 and explain whether it will arrive this week."}
]
while True:
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=messages,
tools=tools,
tool_choice="auto",
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
assistant_message = response.choices[0].message
# Important: keep the full assistant message in a thinking + tool-call loop.
messages.append(assistant_message.model_dump(exclude_none=True))
if not assistant_message.tool_calls:
print(assistant_message.content)
break
for tool_call in assistant_message.tool_calls:
if tool_call.function.name != "lookup_order_status":
raise RuntimeError(f"Unexpected tool call: {tool_call.function.name}")
args = json.loads(tool_call.function.arguments or "{}")
order_id = args.get("order_id")
if not isinstance(order_id, str):
raise RuntimeError("Invalid order_id.")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": lookup_order_status(order_id),
})
Strict mode for Tool Calls
Strict mode is a beta Tool Calls feature. DeepSeek’s official docs say strict mode requires base_url="https://api.deepseek.com/beta" and strict: true inside function definitions. Use strict mode when you need tighter schema adherence, but still validate all arguments before executing tools.
import os
from openai import OpenAI
strict_client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com/beta",
)
strict_tools = [
{
"type": "function",
"function": {
"name": "create_invoice_draft",
"strict": True,
"description": "Create a draft invoice. This does not send the invoice.",
"parameters": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "Internal CRM customer ID",
},
"amount_usd": {
"type": "number",
"description": "Invoice amount in USD",
"minimum": 0,
},
"memo": {
"type": "string",
"description": "Short invoice memo",
},
},
"required": ["customer_id", "amount_usd", "memo"],
"additionalProperties": False,
},
},
}
]
Even with strict mode, application-level validation should check types, permissions, ranges, account ownership, business rules, safety constraints, and authorization.
Chat Completions vs OpenAI Responses API
The OpenAI Python library includes methods for OpenAI’s own endpoints, but DeepSeek’s official examples document /chat/completions and use client.chat.completions.create(). DeepSeek compatibility does not automatically mean every OpenAI endpoint is supported by DeepSeek.
# Recommended for DeepSeek because it is documented by DeepSeek:
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Hello"}],
extra_body={"thinking": {"type": "disabled"}},
)
# Do not assume this is supported by DeepSeek unless DeepSeek documents it:
# response = client.responses.create(...)
For DeepSeek Python code, follow the endpoints and parameters that DeepSeek documents.
Error handling, retries, and timeouts
The OpenAI Python SDK has its own error classes, retry behavior, request IDs, and timeout configuration. DeepSeek API behavior should still be verified against DeepSeek’s official docs, especially for status-code meanings.
import os
import openai
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
timeout=30.0,
max_retries=2,
)
try:
response = client.with_options(timeout=60.0).chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "user", "content": "Give me one Python API reliability tip."}
],
stream=False,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
print("request_id:", getattr(response, "_request_id", None))
except openai.APIConnectionError as exc:
print("Connection problem or timeout while reaching the API.")
print(str(exc.__cause__) if exc.__cause__ else "No low-level cause available.")
except openai.RateLimitError as exc:
print("429 rate limit or traffic-related throttling. Back off and retry later.")
print("request_id:", getattr(exc, "request_id", None))
except openai.APIStatusError as exc:
print(f"API returned status code: {exc.status_code}")
print("request_id:", getattr(exc, "request_id", None))
print("Response body:")
print(exc.response)
Common DeepSeek API status-code meanings:
| Status | Official meaning | What to check |
|---|---|---|
| 400 | Invalid request body format | Fix the request body format, message order, tool messages, or thinking-mode history. |
| 401 | Authentication fails | Check that you are using a valid DeepSeek API key with the DeepSeek base URL. |
| 402 | Insufficient balance | Check account balance and billing setup on the official DeepSeek platform. |
| 422 | Invalid parameters | Check unsupported or malformed request parameters. |
| 429 | Rate limit reached | Pace requests, reduce concurrency, queue work, and retry with backoff. |
| 500 | Server error | Retry after a brief wait. |
| 503 | Server overloaded | Retry later and consider graceful fallback. |
Implementation note: Production clients should implement retry budgets, timeout handling, exponential backoff, and graceful fallback. Do not create aggressive infinite retry loops.
For a deeper troubleshooting reference, see our DeepSeek API Error Codes guide, plus the official DeepSeek Error Codes page.
Token usage, context caching, and cost
DeepSeek bills based on input and output tokens. The actual token count should be read from the API response usage object rather than estimated from character count alone.
usage = response.usage
if usage:
print("prompt_tokens:", usage.prompt_tokens)
print("completion_tokens:", usage.completion_tokens)
print("total_tokens:", usage.total_tokens)
cache_hit = getattr(usage, "prompt_cache_hit_tokens", 0) or 0
cache_miss = getattr(usage, "prompt_cache_miss_tokens", 0) or 0
reasoning_tokens = getattr(
getattr(usage, "completion_tokens_details", None),
"reasoning_tokens",
0,
) or 0
print("prompt_cache_hit_tokens:", cache_hit)
print("prompt_cache_miss_tokens:", cache_miss)
print("reasoning_tokens:", reasoning_tokens)
Current official V4 pricing is listed per 1M tokens and differs by model:
| Model | Input cache hit | Input cache miss | Output |
|---|---|---|---|
deepseek-v4-flash | $0.028 / 1M tokens | $0.14 / 1M tokens | $0.28 / 1M tokens |
deepseek-v4-pro | $0.145 / 1M tokens | $1.74 / 1M tokens | $3.48 / 1M tokens |
The request-level cost formula is:
request_cost = (prompt_cache_hit_tokens / 1_000_000 * cache_hit_rate) + (prompt_cache_miss_tokens / 1_000_000 * cache_miss_rate) + (completion_tokens / 1_000_000 * output_rate)
Simple cost helper for Python
from decimal import Decimal
PRICING_PER_1M = {
"deepseek-v4-flash": {
"cache_hit_input": Decimal("0.028"),
"cache_miss_input": Decimal("0.14"),
"output": Decimal("0.28"),
},
"deepseek-v4-pro": {
"cache_hit_input": Decimal("0.145"),
"cache_miss_input": Decimal("1.74"),
"output": Decimal("3.48"),
},
}
def estimate_deepseek_cost_usd(model: str, usage) -> Decimal:
rates = PRICING_PER_1M[model]
cache_hit = Decimal(getattr(usage, "prompt_cache_hit_tokens", 0) or 0)
cache_miss = Decimal(getattr(usage, "prompt_cache_miss_tokens", 0) or 0)
output = Decimal(getattr(usage, "completion_tokens", 0) or 0)
return (
cache_hit / Decimal("1000000") * rates["cache_hit_input"]
+ cache_miss / Decimal("1000000") * rates["cache_miss_input"]
+ output / Decimal("1000000") * rates["output"]
)
model = "deepseek-v4-flash"
response = client.chat.completions.create(
model=model,
messages=[
{"role": "user", "content": "Explain DeepSeek Python SDK setup in five bullets."}
],
extra_body={"thinking": {"type": "disabled"}},
)
cost = estimate_deepseek_cost_usd(model, response.usage)
print(f"estimated_request_cost: ${cost:.8f}")
Context caching is enabled by default in the DeepSeek API. Only repeated prefix portions can trigger cache hits, and the cache is best-effort rather than guaranteed. Track prompt_cache_hit_tokens and prompt_cache_miss_tokens instead of assuming every repeated request gets cache-hit pricing.
For planning, use our DeepSeek API pricing page. For deeper usage tracking, read our DeepSeek Token Usage guide and the official DeepSeek Token & Token Usage page.
List models and check balance
You can inspect available API models through GET /models and check account balance through GET /user/balance. These are useful for internal diagnostics, dashboards, and pre-launch checks.
import os
import requests
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
models = client.models.list()
for model in models.data:
print(model.id)
balance_response = requests.get(
"https://api.deepseek.com/user/balance",
headers={"Authorization": f"Bearer {os.environ['DEEPSEEK_API_KEY']}"},
timeout=30,
)
balance_response.raise_for_status()
print(balance_response.json())
See the official List Models and Get User Balance references for the latest response shape.
Legacy aliases: deepseek-chat and deepseek-reasoner
The older names deepseek-chat and deepseek-reasoner are now legacy compatibility aliases. They should not be the primary model names in new Python examples, SDK migration guides, pricing pages, or developer tutorials.
| Legacy alias | Current compatibility behavior | Recommended replacement |
|---|---|---|
deepseek-chat | Routes to deepseek-v4-flash non-thinking mode | deepseek-v4-flash with thinking disabled |
deepseek-reasoner | Routes to deepseek-v4-flash thinking mode | deepseek-v4-flash or deepseek-v4-pro with thinking enabled |
Migration rule: replace old examples that use model="deepseek-chat" with model="deepseek-v4-flash", and replace reasoning-heavy examples that use model="deepseek-reasoner" with model="deepseek-v4-pro" plus thinking enabled.
Common mistakes
- Using old model names in new code: Use
deepseek-v4-flashanddeepseek-v4-prodirectly. - Using an OpenAI API key with DeepSeek
base_url: Use a DeepSeek API key for DeepSeek requests. - Using a DeepSeek API key with the default OpenAI endpoint: Set
base_url="https://api.deepseek.com". - Using
baseURLin Python: Python usesbase_url. JavaScript and TypeScript usebaseURL. - Treating
/v1as a DeepSeek model version: It is an OpenAI compatibility path, not a model version. - Copying OpenAI Responses API examples into DeepSeek: DeepSeek’s official examples use Chat Completions. Do not assume
client.responses.create()works unless DeepSeek documents it. - Putting secret keys in browser code: Keep DeepSeek API keys on the server side.
- Assuming app/web behavior equals API behavior: Use official DeepSeek API docs for developer integrations.
- Ignoring
finish_reason: Check forlength,tool_calls, and other stop reasons. - Passing reasoning text into normal history incorrectly: Use final
contentfor normal conversation history unless the active thinking + tool-call loop requires the full assistant message. - Forgetting to validate JSON or tool arguments: Parse and validate before using structured output or executing tools.
- Treating pricing as permanent: Re-check official pricing before launch.
Production checklist
- Use environment variables or a secrets manager for
DEEPSEEK_API_KEY. - Rotate keys when needed and remove old keys from deployment environments.
- Use
deepseek-v4-flashordeepseek-v4-proin new code. - Use
base_url="https://api.deepseek.com"for normal production requests. - Use
https://api.deepseek.com/betaonly for documented beta features. - Set explicit timeouts.
- Configure retries with backoff and a retry budget.
- Log request IDs and status codes where available.
- Handle 400, 401, 402, 422, 429, 500, and 503 with clear behavior.
- Monitor token usage and response
usagefields. - Track
prompt_cache_hit_tokensandprompt_cache_miss_tokens. - Check account balance before production workloads.
- Avoid browser-side API keys.
- Validate JSON Output before using it in automation.
- Validate tool-call arguments before executing functions.
- Use async clients for high-concurrency Python apps.
- Test prompts with representative inputs, not only toy examples.
- Verify pricing, model names, context length, output limits, and feature support before launch.
When this guide is not the right page
- If you only want to try prompts, use the Chat-Deep.ai browser chat.
- If you need Node.js or TypeScript, use the DeepSeek Node.js TypeScript guide.
- If you need cross-language SDK migration, use the OpenAI SDK with DeepSeek guide.
- If you need LangChain orchestration, use the DeepSeek LangChain integration guide.
- If you need local/offline models, use the DeepSeek Local vs API guide.
- If you need official billing, account management, or API keys, use the official DeepSeek platform.
FAQ
What is the DeepSeek Python SDK?
The phrase “DeepSeek Python SDK” usually refers to using DeepSeek from Python through the OpenAI Python client configured with a DeepSeek API key, DeepSeek base_url, and a current DeepSeek V4 model ID.
Does DeepSeek have an official Python SDK?
DeepSeek’s official quick start documents Python usage through the OpenAI Python client configured with DeepSeek’s base URL. This guide uses that documented approach instead of recommending an unofficial DeepSeek-specific Python package.
How do I install the DeepSeek Python SDK?
Install the OpenAI Python package with pip install openai, then create an OpenAI client with your DeepSeek API key and base_url="https://api.deepseek.com".
What base_url should I use for DeepSeek in Python?
Use https://api.deepseek.com for normal API requests. https://api.deepseek.com/v1 is also supported for OpenAI compatibility, but /v1 is not a model version.
Should I use deepseek-v4-flash or deepseek-v4-pro?
Use deepseek-v4-flash for fast, lower-cost workflows and deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, and agentic workflows.
Should I still use deepseek-chat or deepseek-reasoner?
Not for new code. They are legacy compatibility aliases. deepseek-chat currently routes to deepseek-v4-flash non-thinking mode, and deepseek-reasoner currently routes to deepseek-v4-flash thinking mode.
Can I use the OpenAI Python client with DeepSeek?
Yes. DeepSeek’s official quick start says the API uses an OpenAI-compatible format, so Python users can use the OpenAI Python client by changing the API key, base URL, and model ID.
Can I use the OpenAI Responses API with DeepSeek?
DeepSeek’s official examples use Chat Completions. Do not assume OpenAI Responses API examples work with DeepSeek unless DeepSeek documents support for your use case.
How do I stream DeepSeek responses in Python?
Set stream=True in client.chat.completions.create() and iterate over the returned chunks. Read delta.content for visible text and handle empty chunks safely. If you need usage, request stream_options={"include_usage": True}.
How do I use JSON Output with DeepSeek in Python?
Set response_format={"type": "json_object"}, explicitly ask for json in the prompt, provide an example JSON shape, set enough max_tokens, and validate the parsed JSON in your code.
How do I use Tool Calls with DeepSeek in Python?
Define a tools array, let the model return tool_calls, validate the arguments, execute the function in your application, append a tool message with the matching tool_call_id, and send another request.
How do I use Thinking Mode in Python?
Use deepseek-v4-flash or deepseek-v4-pro and pass extra_body={"thinking": {"type": "enabled"}}. For harder reasoning, use deepseek-v4-pro with reasoning_effort="high" or reasoning_effort="max".
Why am I getting a 401 error?
A 401 error means authentication failed. Check that you are using a valid DeepSeek API key, that it is loaded into DEEPSEEK_API_KEY, and that your client is pointed at the DeepSeek base URL.
Why am I getting a 402 error?
A 402 error means insufficient balance. Check your DeepSeek account balance and billing setup before running production workloads.
Can I call DeepSeek directly from browser JavaScript?
No. Do not put a secret DeepSeek API key in public browser code. Use a server-side route, API proxy, worker, or backend service that keeps the key private.
Is Chat-Deep.ai the official DeepSeek website?
No. Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, the official DeepSeek app, or the official DeepSeek developer platform.
Conclusion
The practical DeepSeek Python setup is straightforward: install the OpenAI Python client, set base_url="https://api.deepseek.com", use a DeepSeek API key, and call client.chat.completions.create() with deepseek-v4-flash or deepseek-v4-pro.
For production, go beyond the minimal example. Add secure key storage, timeouts, retries, usage monitoring, JSON validation, tool argument validation, request-id logging, and clear handling for status codes such as 400, 401, 402, 422, 429, 500, and 503.
Continue with the DeepSeek API guide for broader setup, the OpenAI SDK with DeepSeek guide for cross-language migration, and the official DeepSeek documentation for final production decisions.
Official sources and last verified
Last verified: April 24, 2026. DeepSeek model names, pricing, feature support, output limits, endpoint behavior, and legacy alias behavior can change. Use the official sources below before shipping production code.
- DeepSeek API Docs: Your First API Call
- DeepSeek-V4 Preview Release
- DeepSeek Models & Pricing
- DeepSeek Create Chat Completion API
- DeepSeek Thinking Mode
- DeepSeek JSON Output
- DeepSeek Tool Calls
- DeepSeek Context Caching
- DeepSeek Token & Token Usage
- DeepSeek Error Codes
- DeepSeek List Models
- DeepSeek Get User Balance
- OpenAI official SDK libraries documentation





