Quick answer :
DeepSeek Python SDK usually means using the official OpenAI Python SDK with DeepSeek-specific settings. Install the
openaiPython package, create anOpenAIclient with your DeepSeek API key, setbase_url="https://api.deepseek.com", and callclient.chat.completions.create().
For new Python projects, use the current DeepSeek V4 API model IDs:deepseek-v4-flashfor fast everyday workloads anddeepseek-v4-profor harder reasoning, coding, long-context analysis, and higher-value production tasks.
Independent disclosure: Chat-Deep.ai is an independent DeepSeek-focused guide and browser access site. Chat-Deep.ai is not affiliated with DeepSeek, DeepSeek.com, Hangzhou DeepSeek Artificial Intelligence Co., Ltd., the official DeepSeek app, the official DeepSeek API platform, OpenAI, or the OpenAI Python SDK.
This guide is written to help developers understand how to use DeepSeek with Python. For production decisions, always confirm model names, API behavior, policy details, and billing information against the official DeepSeek documentation.
Current DeepSeek API snapshot
- Current API model IDs:
deepseek-v4-flashanddeepseek-v4-pro - Base URL:
https://api.deepseek.com - API format: OpenAI-compatible Chat Completions
- Context length: 1M tokens
- Max output: 384K tokens
- Thinking mode: supported
- Non-thinking mode: supported
- JSON Output: supported
- Tool Calls: supported
- Legacy aliases:
deepseek-chatanddeepseek-reasonerare legacy compatibility aliases scheduled for retirement after July 24, 2026, 15:59 UTC
Table of contents
- Who this guide is for
- What “DeepSeek Python SDK” actually means
- Install the Python package
- Set your DeepSeek API key safely
- DeepSeek base_url explained
- Choose between deepseek-v4-flash and deepseek-v4-pro
- Minimal Python example
- Non-thinking mode example
- Thinking Mode example
- Streaming responses in Python
- Async Python example
- Multi-turn chat in Python
- JSON Output in Python
- Tool Calls in Python
- Strict Tool Calls mode note
- Token usage, context caching, and cost control without prices
- Error handling, rate limits, retries, and timeouts
- Common mistakes
- Production checklist
- When this guide is not the right page
- FAQ
Who this guide is for
This guide is for Python developers who want to call the DeepSeek API from backend services, scripts, internal tools, data pipelines, chat applications, coding assistants, extraction workflows, or agent prototypes.
Use this page if you want a practical DeepSeek API Python walkthrough with copy-paste-ready examples for Chat Completions, streaming, async calls, multi-turn chat, JSON Output, Tool Calls, Thinking Mode, token usage, retries, and safer API key handling.
If you are not writing Python code, use the Related DeepSeek developer guides at the end of this article to jump to the broader DeepSeek API guide, the OpenAI SDK migration guide, or the Node.js TypeScript guide.
What “DeepSeek Python SDK” actually means
Many developers search for “DeepSeek Python SDK,” but the documented Python path is to use the OpenAI Python SDK with a DeepSeek API key and DeepSeek base URL. In practical terms, this means your Python code imports OpenAI from the openai package, points the client at https://api.deepseek.com, and uses DeepSeek model IDs instead of OpenAI model IDs.
Do not install or recommend an unofficial package named deepseek unless DeepSeek officially documents it for your specific use case. For this guide, “DeepSeek Python SDK” means:
- The
openaiPython package - A DeepSeek API key stored securely
base_url="https://api.deepseek.com"client.chat.completions.create()- Current V4 model IDs:
deepseek-v4-flashanddeepseek-v4-pro
This page focuses on Chat Completions. Do not assume OpenAI’s newer Responses API examples work with DeepSeek unless DeepSeek officially documents that endpoint for your use case.
Install the Python package
Install the OpenAI Python package in your Python environment:
pip install openai
If you want to load environment variables from a local .env file during development, install python-dotenv too:
pip install python-dotenv
For production services, prefer secure environment variables, secrets managers, or your cloud platform’s secret storage. Do not hard-code API keys in Python files.
Set your DeepSeek API key safely
Keep your DeepSeek API key server-side. Never expose a live key in browser JavaScript, public mobile code, public repositories, logs, screenshots, support tickets, or client-side bundles.
macOS or Linux
export DEEPSEEK_API_KEY="your_api_key_here"
Windows PowerShell
[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "your_api_key_here", "User")
Optional local .env file for development
If you use a local .env file, add it to .gitignore before you write any secrets into it.
DEEPSEEK_API_KEY="your_api_key_here"
Then load it in Python:
from dotenv import load_dotenv
load_dotenv()
In the examples below, the key is read with os.environ["DEEPSEEK_API_KEY"]. That intentionally fails fast if the variable is missing.
DeepSeek base_url explained
The base_url tells the OpenAI Python SDK to send requests to DeepSeek instead of the default OpenAI endpoint. For normal Python Chat Completions usage, set:
base_url="https://api.deepseek.com"
Use base_url in Python. The baseURL spelling is common in JavaScript and TypeScript examples, but it is not the Python parameter name.
https://api.deepseek.comis the recommended default for normal DeepSeek OpenAI-format API requests.https://api.deepseek.com/anthropicis for Anthropic-format usage, not the focus of this Python Chat Completions guide.https://api.deepseek.com/betashould be used only when an official beta feature requires it, such as strict Tool Calls mode.
Choose between deepseek-v4-flash and deepseek-v4-pro
For new Python code, use deepseek-v4-flash or deepseek-v4-pro. Use the official DeepSeek pricing page for current API rates.
Use deepseek-v4-flash when you need speed and simplicity
deepseek-v4-flash is a good first choice for chat, summaries, classification, routine coding help, JSON extraction, support workflows, and high-volume applications where fast responses matter.
Use deepseek-v4-pro when the task is harder
deepseek-v4-pro is a better fit for harder reasoning, complex coding, long-context document analysis, tool planning, agentic workflows, and production tasks where answer quality is more important than response speed.
Legacy aliases
deepseek-chat currently maps to DeepSeek V4 Flash non-thinking mode for compatibility. deepseek-reasoner currently maps to DeepSeek V4 Flash thinking mode for compatibility. Both are legacy aliases scheduled for retirement after July 24, 2026, 15:59 UTC.
Minimal Python example
This is the smallest practical DeepSeek Python example using the OpenAI SDK and the current V4 Flash model in non-thinking mode:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise Python assistant."},
{"role": "user", "content": "Explain DeepSeek Python setup in one paragraph."},
],
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
The required ideas are simple: create the client, send a messages list, choose a current DeepSeek model, and read the assistant’s final answer from response.choices[0].message.content.
Non-thinking mode example
Non-thinking mode is usually the better default for simple chat, structured extraction, classification, short summaries, routine coding help, and other tasks where you do not need explicit reasoning behavior.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise support assistant."},
{"role": "user", "content": "Summarize this support ticket in three bullet points."},
],
max_tokens=600,
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
Use this pattern when you want a direct final answer and do not need to manage reasoning_content.
Thinking Mode example
DeepSeek V4 supports thinking and non-thinking modes. Thinking mode is enabled through a DeepSeek-specific thinking object. When using the OpenAI Python SDK, pass that object through extra_body.
In thinking mode, reasoning_effort supports high and max. The final user-facing answer is returned in content. Reasoning content is returned separately as reasoning_content.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{
"role": "user",
"content": "Compare two Python API retry strategies and explain the tradeoffs.",
}
],
reasoning_effort="high",
max_tokens=2000,
extra_body={"thinking": {"type": "enabled"}},
)
message = response.choices[0].message
reasoning = getattr(message, "reasoning_content", None)
if reasoning:
# Keep reasoning separate from normal end-user output.
# Store, inspect, or discard it according to your product policy.
pass
print(message.content)
For normal user-facing applications, display the final content and keep reasoning_content separate unless your product has a clear policy for showing it. In thinking mode, parameters such as temperature, top_p, presence_penalty, and frequency_penalty do not affect output even if passed.
Streaming responses in Python
Set stream=True when you want partial chunks as the answer is generated. Streaming is useful for chat UIs, internal assistants, and long answers where users should see progress quickly.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise Python assistant."},
{"role": "user", "content": "Give me three tips for reliable Python API clients."},
],
stream=True,
extra_body={"thinking": {"type": "disabled"}},
)
for chunk in stream:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
content = getattr(delta, "content", None)
if content:
print(content, end="", flush=True)
Always handle empty chunks or missing deltas. Streaming responses can include control chunks that do not contain user-visible text.
Async Python example
Use AsyncOpenAI for asynchronous Python services, especially when your app needs to handle many concurrent API calls without blocking worker threads.
import asyncio
import os
from openai import AsyncOpenAI
client = AsyncOpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
async def main() -> None:
response = await client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a helpful Python assistant."},
{"role": "user", "content": "Explain async API clients in two sentences."},
],
extra_body={"thinking": {"type": "disabled"}},
)
print(response.choices[0].message.content)
asyncio.run(main())
The async request shape is almost the same as the synchronous example. The main difference is that you create an AsyncOpenAI client and use await.
Multi-turn chat in Python
Chat Completions are stateless from the application point of view. The server does not remember earlier turns for your app. If you want a multi-turn conversation, your Python application must keep the conversation history and send the relevant messages again on each request.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
messages = [
{"role": "system", "content": "You are a helpful Python assistant."},
{"role": "user", "content": "What is context caching in one sentence?"},
]
first_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
extra_body={"thinking": {"type": "disabled"}},
)
first_message = first_response.choices[0].message
print(first_message.content)
messages.append({"role": "assistant", "content": first_message.content or ""})
messages.append({"role": "user", "content": "How should I structure repeated prompts to benefit from it?"})
second_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
extra_body={"thinking": {"type": "disabled"}},
)
print(second_response.choices[0].message.content)
For long conversations, trim or summarize older turns so you do not send unnecessary context. For thinking-mode tool-call loops, preserve the full assistant message because it may include both reasoning_content and tool_calls.
JSON Output in Python
Use JSON Output when your application needs a structured response that Python can parse. Set response_format={"type": "json_object"}, include the word “json” in the prompt, provide an example shape, and set max_tokens reasonably to reduce truncation risk.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
system_prompt = """
You extract customer feedback into json.
Return only a valid json object with this shape:
{
"sentiment": "positive | neutral | negative",
"summary": "short summary",
"action_required": true
}
"""
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": system_prompt},
{
"role": "user",
"content": "The API is fast, but the dashboard export failed twice today.",
},
],
response_format={"type": "json_object"},
max_tokens=500,
extra_body={"thinking": {"type": "disabled"}},
)
choice = response.choices[0]
message_content = choice.message.content or ""
if choice.finish_reason == "length":
raise RuntimeError("The JSON response may have been truncated. Increase max_tokens or shorten the prompt.")
if not message_content.strip():
raise RuntimeError("The model returned an empty response. Try making the JSON instruction more explicit.")
data = json.loads(message_content)
required_keys = {"sentiment", "summary", "action_required"}
missing_keys = required_keys - set(data)
if missing_keys:
raise ValueError(f"Missing required keys: {missing_keys}")
print(data)
JSON Output makes parsing easier, but your application should still validate required keys, value types, and allowed enum values before trusting the result.
Tool Calls in Python
Tool Calls let the model request a function call, but the model does not execute the function automatically. Your application must validate the arguments, run the function, append a tool message with the correct tool_call_id, and send the next request.
import json
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
def get_order_status(order_id: str) -> dict:
# Replace this demo logic with your database or internal API call.
if not order_id.startswith("ORD-"):
raise ValueError("Invalid order_id format.")
return {
"order_id": order_id,
"status": "processing",
"estimated_ship_date": "tomorrow",
}
tools = [
{
"type": "function",
"function": {
"name": "get_order_status",
"description": "Get the current status of a customer order.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID, for example ORD-12345.",
}
},
"required": ["order_id"],
},
},
}
]
messages = [
{
"role": "user",
"content": "Can you check the status of order ORD-12345?",
}
]
first_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=tools,
tool_choice="auto",
extra_body={"thinking": {"type": "disabled"}},
)
assistant_message = first_response.choices[0].message
messages.append(assistant_message.model_dump(exclude_none=True))
if assistant_message.tool_calls:
for tool_call in assistant_message.tool_calls:
if tool_call.function.name != "get_order_status":
raise ValueError(f"Unsupported tool: {tool_call.function.name}")
arguments = json.loads(tool_call.function.arguments)
order_id = arguments.get("order_id")
if not isinstance(order_id, str):
raise ValueError("order_id must be a string.")
result = get_order_status(order_id)
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result),
}
)
second_response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=messages,
tools=tools,
tool_choice="auto",
extra_body={"thinking": {"type": "disabled"}},
)
print(second_response.choices[0].message.content)
else:
print(assistant_message.content)
Always validate tool arguments before running business logic. Treat model-provided function arguments as untrusted input.
Strict Tool Calls mode note
Strict Tool Calls mode is a beta feature. Use it only when you specifically need stricter schema adherence and you are ready to handle beta behavior.
For strict mode, create the client with the beta base URL:
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com/beta",
)
In strict mode, each function definition should set "strict": true, and the JSON Schema must follow the supported schema rules. If your schema is invalid or unsupported, the API can reject the request.
Token usage, context caching, and cost control without prices
DeepSeek API responses can include token usage information. Use this data to monitor prompt size, output size, and application behavior over time.
usage = response.usage
if usage:
print("Prompt tokens:", getattr(usage, "prompt_tokens", None))
print("Completion tokens:", getattr(usage, "completion_tokens", None))
print("Total tokens:", getattr(usage, "total_tokens", None))
print("Prompt cache hit tokens:", getattr(usage, "prompt_cache_hit_tokens", None))
print("Prompt cache miss tokens:", getattr(usage, "prompt_cache_miss_tokens", None))
Context Caching is enabled by default and does not require a code change. It can help repeated-prefix workloads such as multi-turn conversations, document analysis, and workflows that reuse the same system message or long input prefix.
Because DeepSeek API pricing can change, this guide does not copy token prices. Check the official DeepSeek pricing page and Chat-Deep.ai’s pricing guide before making billing decisions.
Practical cost-control habits without copying prices
- Set reasonable
max_tokensvalues for each route. - Use non-thinking mode for simple tasks.
- Use thinking mode only where it improves quality enough to justify longer outputs.
- Reuse stable system prompts and repeated document prefixes where practical.
- Log token usage by route, model, and feature flag.
- Keep long conversation history trimmed or summarized.
- Do not send large files or documents when only a small excerpt is needed.
Error handling, rate limits, retries, and timeouts
DeepSeek rate limits are dynamic. If your application sends too many concurrent requests, you may receive HTTP 429. Server-side errors such as 500 or 503 may be worth retrying after a delay. Do not blindly retry 400, 401, 402, or 422 without fixing the underlying issue.
import os
import time
from openai import APIStatusError, APITimeoutError, OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
timeout=60,
max_retries=0,
)
RETRY_STATUS_CODES = {429, 500, 503}
def create_completion_with_retry(prompt: str, max_attempts: int = 3) -> str:
delay_seconds = 2
for attempt in range(1, max_attempts + 1):
try:
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise Python assistant."},
{"role": "user", "content": prompt},
],
extra_body={"thinking": {"type": "disabled"}},
)
return response.choices[0].message.content or ""
except APITimeoutError:
if attempt == max_attempts:
raise
except APIStatusError as exc:
status_code = exc.status_code
if status_code not in RETRY_STATUS_CODES:
raise
if attempt == max_attempts:
raise
time.sleep(delay_seconds)
delay_seconds *= 2
raise RuntimeError("Request failed after retries.")
answer = create_completion_with_retry("Explain HTTP 429 in one sentence.")
print(answer)
Use retries carefully. A 401 usually means the API key is wrong. A 402 means the account cannot complete the request until the billing issue is resolved. A 422 usually means invalid parameters. Retrying those errors without changing anything usually wastes time.
Common mistakes
- Installing the wrong package: use
pip install openai, not an unofficial package unless DeepSeek explicitly documents it. - Using old OpenAI syntax: do not use
openai.ChatCompletion.create(). Useclient.chat.completions.create(). - Forgetting base_url: without
base_url="https://api.deepseek.com", the client will not target DeepSeek. - Using legacy aliases as primary models: use
deepseek-v4-flashordeepseek-v4-profor new code. - Assuming Chat Completions and Responses API are the same: this guide uses Chat Completions because that is the DeepSeek-compatible Python path documented for this workflow.
- Printing secrets: never print API keys, environment variables containing secrets, or authorization headers.
- Displaying reasoning by default: keep
reasoning_contentseparate from normal user-facing output unless your product policy says otherwise. - Trusting tool arguments blindly: validate tool-call arguments before running functions, database queries, or external requests.
- Retrying every error: retry rate-limit and server-overload cases, but fix invalid requests, bad keys, and invalid parameters.
- Copying prices into evergreen docs: link to the official pricing page instead of hard-coding rates that can change.
Production checklist
- Use
deepseek-v4-flashordeepseek-v4-profor new integrations. - Store
DEEPSEEK_API_KEYsecurely outside source code. - Set
base_url="https://api.deepseek.com"in the OpenAI Python client. - Choose thinking mode intentionally instead of relying on defaults for every route.
- Keep
reasoning_contentseparate from final user-facing output. - Use JSON Output only with explicit json instructions and validation.
- Validate all Tool Calls arguments before executing functions.
- Use
https://api.deepseek.com/betaonly when a documented beta feature requires it. - Log token usage and monitor unusually long prompts or outputs.
- Handle 429, 500, and 503 with controlled retries and backoff.
- Do not blindly retry 400, 401, 402, or 422.
- Keep pricing links dynamic instead of copying prices into the article.
When this guide is not the right page
This guide is for Python developers using DeepSeek through the OpenAI-compatible Chat Completions workflow. It is not the best page for every use case.
- If you want a broad API overview, read the DeepSeek API guide.
- If you want model selection details, read the DeepSeek V4 guide.
- If you are migrating OpenAI-style code, read the OpenAI SDK to DeepSeek guide.
- If you write JavaScript or TypeScript, read the DeepSeek Node.js TypeScript guide.
- If you need structured outputs, read the DeepSeek JSON Output guide.
- If you need deeper function-calling examples, read the DeepSeek Tool Calls guide.
- If you need reasoning behavior details, read the DeepSeek Thinking Mode guide.
- If you only want to try prompts without code, start from the Chat-Deep.ai homepage.
FAQ
Is there an official DeepSeek Python SDK?
The officially documented Python path is to use the OpenAI Python SDK with DeepSeek’s API key, DeepSeek’s base URL, and DeepSeek model IDs. This is why developers often describe the setup as a DeepSeek Python SDK workflow.
Which Python package should I install for DeepSeek?
Install the openai package with pip install openai. Do not use an unofficial DeepSeek package unless the official DeepSeek documentation explicitly recommends it for your use case.
What base_url should I use for DeepSeek in Python?
Use base_url="https://api.deepseek.com" for normal OpenAI-compatible Chat Completions requests.
Should I use deepseek-v4-flash or deepseek-v4-pro?
Use deepseek-v4-flash for fast everyday workloads, summaries, extraction, routine coding help, and high-volume applications. Use deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, and agentic workflows.
Can I still use deepseek-chat or deepseek-reasoner?
They are legacy compatibility aliases. For new code, use deepseek-v4-flash or deepseek-v4-pro. The legacy aliases are scheduled for retirement after July 24, 2026, 15:59 UTC.
Does DeepSeek support JSON Output in Python?
Yes. Use response_format={"type": "json_object"}, include the word “json” in the prompt, provide an example shape, and validate the parsed result in Python.
Does DeepSeek support Tool Calls in Python?
Yes. The model can request a tool call, but your application executes the function. Validate arguments, run the function, append a tool message with the matching tool_call_id, and send the next request.
Where can I check DeepSeek API pricing?
Because DeepSeek API pricing can change, this article does not copy token prices. Check the official DeepSeek pricing page and Chat-Deep.ai’s pricing guide before making billing decisions.
