DeepSeek Python SDK Guide: Use DeepSeek with Python

Quick answer :

DeepSeek Python SDK usually means using the official OpenAI Python SDK with DeepSeek-specific settings. Install the openai Python package, create an OpenAI client with your DeepSeek API key, set base_url="https://api.deepseek.com", and call client.chat.completions.create().
For new Python projects, use the current DeepSeek V4 API model IDs: deepseek-v4-flash for fast everyday workloads and deepseek-v4-pro for harder reasoning, coding, long-context analysis, and higher-value production tasks.

Independent disclosure: Chat-Deep.ai is an independent DeepSeek-focused guide and browser access site. Chat-Deep.ai is not affiliated with DeepSeek, DeepSeek.com, Hangzhou DeepSeek Artificial Intelligence Co., Ltd., the official DeepSeek app, the official DeepSeek API platform, OpenAI, or the OpenAI Python SDK.

This guide is written to help developers understand how to use DeepSeek with Python. For production decisions, always confirm model names, API behavior, policy details, and billing information against the official DeepSeek documentation.

Current DeepSeek API snapshot

  • Current API model IDs: deepseek-v4-flash and deepseek-v4-pro
  • Base URL: https://api.deepseek.com
  • API format: OpenAI-compatible Chat Completions
  • Context length: 1M tokens
  • Max output: 384K tokens
  • Thinking mode: supported
  • Non-thinking mode: supported
  • JSON Output: supported
  • Tool Calls: supported
  • Legacy aliases: deepseek-chat and deepseek-reasoner are legacy compatibility aliases scheduled for retirement after July 24, 2026, 15:59 UTC

Table of contents

  1. Who this guide is for
  2. What “DeepSeek Python SDK” actually means
  3. Install the Python package
  4. Set your DeepSeek API key safely
  5. DeepSeek base_url explained
  6. Choose between deepseek-v4-flash and deepseek-v4-pro
  7. Minimal Python example
  8. Non-thinking mode example
  9. Thinking Mode example
  10. Streaming responses in Python
  11. Async Python example
  12. Multi-turn chat in Python
  13. JSON Output in Python
  14. Tool Calls in Python
  15. Strict Tool Calls mode note
  16. Token usage, context caching, and cost control without prices
  17. Error handling, rate limits, retries, and timeouts
  18. Common mistakes
  19. Production checklist
  20. When this guide is not the right page
  21. FAQ

Who this guide is for

This guide is for Python developers who want to call the DeepSeek API from backend services, scripts, internal tools, data pipelines, chat applications, coding assistants, extraction workflows, or agent prototypes.

Use this page if you want a practical DeepSeek API Python walkthrough with copy-paste-ready examples for Chat Completions, streaming, async calls, multi-turn chat, JSON Output, Tool Calls, Thinking Mode, token usage, retries, and safer API key handling.

If you are not writing Python code, use the Related DeepSeek developer guides at the end of this article to jump to the broader DeepSeek API guide, the OpenAI SDK migration guide, or the Node.js TypeScript guide.

What “DeepSeek Python SDK” actually means

Many developers search for “DeepSeek Python SDK,” but the documented Python path is to use the OpenAI Python SDK with a DeepSeek API key and DeepSeek base URL. In practical terms, this means your Python code imports OpenAI from the openai package, points the client at https://api.deepseek.com, and uses DeepSeek model IDs instead of OpenAI model IDs.

Do not install or recommend an unofficial package named deepseek unless DeepSeek officially documents it for your specific use case. For this guide, “DeepSeek Python SDK” means:

  • The openai Python package
  • A DeepSeek API key stored securely
  • base_url="https://api.deepseek.com"
  • client.chat.completions.create()
  • Current V4 model IDs: deepseek-v4-flash and deepseek-v4-pro

This page focuses on Chat Completions. Do not assume OpenAI’s newer Responses API examples work with DeepSeek unless DeepSeek officially documents that endpoint for your use case.

Install the Python package

Install the OpenAI Python package in your Python environment:

pip install openai

If you want to load environment variables from a local .env file during development, install python-dotenv too:

pip install python-dotenv

For production services, prefer secure environment variables, secrets managers, or your cloud platform’s secret storage. Do not hard-code API keys in Python files.

Set your DeepSeek API key safely

Keep your DeepSeek API key server-side. Never expose a live key in browser JavaScript, public mobile code, public repositories, logs, screenshots, support tickets, or client-side bundles.

macOS or Linux

export DEEPSEEK_API_KEY="your_api_key_here"

Windows PowerShell

[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "your_api_key_here", "User")

Optional local .env file for development

If you use a local .env file, add it to .gitignore before you write any secrets into it.

DEEPSEEK_API_KEY="your_api_key_here"

Then load it in Python:

from dotenv import load_dotenv

load_dotenv()

In the examples below, the key is read with os.environ["DEEPSEEK_API_KEY"]. That intentionally fails fast if the variable is missing.

DeepSeek base_url explained

The base_url tells the OpenAI Python SDK to send requests to DeepSeek instead of the default OpenAI endpoint. For normal Python Chat Completions usage, set:

base_url="https://api.deepseek.com"

Use base_url in Python. The baseURL spelling is common in JavaScript and TypeScript examples, but it is not the Python parameter name.

  • https://api.deepseek.com is the recommended default for normal DeepSeek OpenAI-format API requests.
  • https://api.deepseek.com/anthropic is for Anthropic-format usage, not the focus of this Python Chat Completions guide.
  • https://api.deepseek.com/beta should be used only when an official beta feature requires it, such as strict Tool Calls mode.

Choose between deepseek-v4-flash and deepseek-v4-pro

For new Python code, use deepseek-v4-flash or deepseek-v4-pro. Use the official DeepSeek pricing page for current API rates.

Use deepseek-v4-flash when you need speed and simplicity

deepseek-v4-flash is a good first choice for chat, summaries, classification, routine coding help, JSON extraction, support workflows, and high-volume applications where fast responses matter.

Use deepseek-v4-pro when the task is harder

deepseek-v4-pro is a better fit for harder reasoning, complex coding, long-context document analysis, tool planning, agentic workflows, and production tasks where answer quality is more important than response speed.

Legacy aliases

deepseek-chat currently maps to DeepSeek V4 Flash non-thinking mode for compatibility. deepseek-reasoner currently maps to DeepSeek V4 Flash thinking mode for compatibility. Both are legacy aliases scheduled for retirement after July 24, 2026, 15:59 UTC.

Minimal Python example

This is the smallest practical DeepSeek Python example using the OpenAI SDK and the current V4 Flash model in non-thinking mode:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise Python assistant."},
        {"role": "user", "content": "Explain DeepSeek Python setup in one paragraph."},
    ],
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

The required ideas are simple: create the client, send a messages list, choose a current DeepSeek model, and read the assistant’s final answer from response.choices[0].message.content.

Non-thinking mode example

Non-thinking mode is usually the better default for simple chat, structured extraction, classification, short summaries, routine coding help, and other tasks where you do not need explicit reasoning behavior.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise support assistant."},
        {"role": "user", "content": "Summarize this support ticket in three bullet points."},
    ],
    max_tokens=600,
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

Use this pattern when you want a direct final answer and do not need to manage reasoning_content.

Thinking Mode example

DeepSeek V4 supports thinking and non-thinking modes. Thinking mode is enabled through a DeepSeek-specific thinking object. When using the OpenAI Python SDK, pass that object through extra_body.

In thinking mode, reasoning_effort supports high and max. The final user-facing answer is returned in content. Reasoning content is returned separately as reasoning_content.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {
            "role": "user",
            "content": "Compare two Python API retry strategies and explain the tradeoffs.",
        }
    ],
    reasoning_effort="high",
    max_tokens=2000,
    extra_body={"thinking": {"type": "enabled"}},
)

message = response.choices[0].message

reasoning = getattr(message, "reasoning_content", None)
if reasoning:
    # Keep reasoning separate from normal end-user output.
    # Store, inspect, or discard it according to your product policy.
    pass

print(message.content)

For normal user-facing applications, display the final content and keep reasoning_content separate unless your product has a clear policy for showing it. In thinking mode, parameters such as temperature, top_p, presence_penalty, and frequency_penalty do not affect output even if passed.

Streaming responses in Python

Set stream=True when you want partial chunks as the answer is generated. Streaming is useful for chat UIs, internal assistants, and long answers where users should see progress quickly.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise Python assistant."},
        {"role": "user", "content": "Give me three tips for reliable Python API clients."},
    ],
    stream=True,
    extra_body={"thinking": {"type": "disabled"}},
)

for chunk in stream:
    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta
    content = getattr(delta, "content", None)

    if content:
        print(content, end="", flush=True)

Always handle empty chunks or missing deltas. Streaming responses can include control chunks that do not contain user-visible text.

Async Python example

Use AsyncOpenAI for asynchronous Python services, especially when your app needs to handle many concurrent API calls without blocking worker threads.

import asyncio
import os
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

async def main() -> None:
    response = await client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "system", "content": "You are a helpful Python assistant."},
            {"role": "user", "content": "Explain async API clients in two sentences."},
        ],
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(response.choices[0].message.content)

asyncio.run(main())

The async request shape is almost the same as the synchronous example. The main difference is that you create an AsyncOpenAI client and use await.

Multi-turn chat in Python

Chat Completions are stateless from the application point of view. The server does not remember earlier turns for your app. If you want a multi-turn conversation, your Python application must keep the conversation history and send the relevant messages again on each request.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

messages = [
    {"role": "system", "content": "You are a helpful Python assistant."},
    {"role": "user", "content": "What is context caching in one sentence?"},
]

first_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

first_message = first_response.choices[0].message
print(first_message.content)

messages.append({"role": "assistant", "content": first_message.content or ""})
messages.append({"role": "user", "content": "How should I structure repeated prompts to benefit from it?"})

second_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

print(second_response.choices[0].message.content)

For long conversations, trim or summarize older turns so you do not send unnecessary context. For thinking-mode tool-call loops, preserve the full assistant message because it may include both reasoning_content and tool_calls.

JSON Output in Python

Use JSON Output when your application needs a structured response that Python can parse. Set response_format={"type": "json_object"}, include the word “json” in the prompt, provide an example shape, and set max_tokens reasonably to reduce truncation risk.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

system_prompt = """
You extract customer feedback into json.

Return only a valid json object with this shape:
{
  "sentiment": "positive | neutral | negative",
  "summary": "short summary",
  "action_required": true
}
"""

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": "The API is fast, but the dashboard export failed twice today.",
        },
    ],
    response_format={"type": "json_object"},
    max_tokens=500,
    extra_body={"thinking": {"type": "disabled"}},
)

choice = response.choices[0]
message_content = choice.message.content or ""

if choice.finish_reason == "length":
    raise RuntimeError("The JSON response may have been truncated. Increase max_tokens or shorten the prompt.")

if not message_content.strip():
    raise RuntimeError("The model returned an empty response. Try making the JSON instruction more explicit.")

data = json.loads(message_content)

required_keys = {"sentiment", "summary", "action_required"}
missing_keys = required_keys - set(data)

if missing_keys:
    raise ValueError(f"Missing required keys: {missing_keys}")

print(data)

JSON Output makes parsing easier, but your application should still validate required keys, value types, and allowed enum values before trusting the result.

Tool Calls in Python

Tool Calls let the model request a function call, but the model does not execute the function automatically. Your application must validate the arguments, run the function, append a tool message with the correct tool_call_id, and send the next request.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

def get_order_status(order_id: str) -> dict:
    # Replace this demo logic with your database or internal API call.
    if not order_id.startswith("ORD-"):
        raise ValueError("Invalid order_id format.")

    return {
        "order_id": order_id,
        "status": "processing",
        "estimated_ship_date": "tomorrow",
    }

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_order_status",
            "description": "Get the current status of a customer order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The order ID, for example ORD-12345.",
                    }
                },
                "required": ["order_id"],
            },
        },
    }
]

messages = [
    {
        "role": "user",
        "content": "Can you check the status of order ORD-12345?",
    }
]

first_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    extra_body={"thinking": {"type": "disabled"}},
)

assistant_message = first_response.choices[0].message
messages.append(assistant_message.model_dump(exclude_none=True))

if assistant_message.tool_calls:
    for tool_call in assistant_message.tool_calls:
        if tool_call.function.name != "get_order_status":
            raise ValueError(f"Unsupported tool: {tool_call.function.name}")

        arguments = json.loads(tool_call.function.arguments)
        order_id = arguments.get("order_id")

        if not isinstance(order_id, str):
            raise ValueError("order_id must be a string.")

        result = get_order_status(order_id)

        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            }
        )

    second_response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(second_response.choices[0].message.content)
else:
    print(assistant_message.content)

Always validate tool arguments before running business logic. Treat model-provided function arguments as untrusted input.

Strict Tool Calls mode note

Strict Tool Calls mode is a beta feature. Use it only when you specifically need stricter schema adherence and you are ready to handle beta behavior.

For strict mode, create the client with the beta base URL:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/beta",
)

In strict mode, each function definition should set "strict": true, and the JSON Schema must follow the supported schema rules. If your schema is invalid or unsupported, the API can reject the request.

Token usage, context caching, and cost control without prices

DeepSeek API responses can include token usage information. Use this data to monitor prompt size, output size, and application behavior over time.

usage = response.usage

if usage:
    print("Prompt tokens:", getattr(usage, "prompt_tokens", None))
    print("Completion tokens:", getattr(usage, "completion_tokens", None))
    print("Total tokens:", getattr(usage, "total_tokens", None))

    print("Prompt cache hit tokens:", getattr(usage, "prompt_cache_hit_tokens", None))
    print("Prompt cache miss tokens:", getattr(usage, "prompt_cache_miss_tokens", None))

Context Caching is enabled by default and does not require a code change. It can help repeated-prefix workloads such as multi-turn conversations, document analysis, and workflows that reuse the same system message or long input prefix.

Because DeepSeek API pricing can change, this guide does not copy token prices. Check the official DeepSeek pricing page and Chat-Deep.ai’s pricing guide before making billing decisions.

Practical cost-control habits without copying prices

  • Set reasonable max_tokens values for each route.
  • Use non-thinking mode for simple tasks.
  • Use thinking mode only where it improves quality enough to justify longer outputs.
  • Reuse stable system prompts and repeated document prefixes where practical.
  • Log token usage by route, model, and feature flag.
  • Keep long conversation history trimmed or summarized.
  • Do not send large files or documents when only a small excerpt is needed.

Error handling, rate limits, retries, and timeouts

DeepSeek rate limits are dynamic. If your application sends too many concurrent requests, you may receive HTTP 429. Server-side errors such as 500 or 503 may be worth retrying after a delay. Do not blindly retry 400, 401, 402, or 422 without fixing the underlying issue.

import os
import time
from openai import APIStatusError, APITimeoutError, OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
    timeout=60,
    max_retries=0,
)

RETRY_STATUS_CODES = {429, 500, 503}

def create_completion_with_retry(prompt: str, max_attempts: int = 3) -> str:
    delay_seconds = 2

    for attempt in range(1, max_attempts + 1):
        try:
            response = client.chat.completions.create(
                model="deepseek-v4-flash",
                messages=[
                    {"role": "system", "content": "You are a concise Python assistant."},
                    {"role": "user", "content": prompt},
                ],
                extra_body={"thinking": {"type": "disabled"}},
            )
            return response.choices[0].message.content or ""

        except APITimeoutError:
            if attempt == max_attempts:
                raise

        except APIStatusError as exc:
            status_code = exc.status_code

            if status_code not in RETRY_STATUS_CODES:
                raise

            if attempt == max_attempts:
                raise

        time.sleep(delay_seconds)
        delay_seconds *= 2

    raise RuntimeError("Request failed after retries.")

answer = create_completion_with_retry("Explain HTTP 429 in one sentence.")
print(answer)

Use retries carefully. A 401 usually means the API key is wrong. A 402 means the account cannot complete the request until the billing issue is resolved. A 422 usually means invalid parameters. Retrying those errors without changing anything usually wastes time.

Common mistakes

  • Installing the wrong package: use pip install openai, not an unofficial package unless DeepSeek explicitly documents it.
  • Using old OpenAI syntax: do not use openai.ChatCompletion.create(). Use client.chat.completions.create().
  • Forgetting base_url: without base_url="https://api.deepseek.com", the client will not target DeepSeek.
  • Using legacy aliases as primary models: use deepseek-v4-flash or deepseek-v4-pro for new code.
  • Assuming Chat Completions and Responses API are the same: this guide uses Chat Completions because that is the DeepSeek-compatible Python path documented for this workflow.
  • Printing secrets: never print API keys, environment variables containing secrets, or authorization headers.
  • Displaying reasoning by default: keep reasoning_content separate from normal user-facing output unless your product policy says otherwise.
  • Trusting tool arguments blindly: validate tool-call arguments before running functions, database queries, or external requests.
  • Retrying every error: retry rate-limit and server-overload cases, but fix invalid requests, bad keys, and invalid parameters.
  • Copying prices into evergreen docs: link to the official pricing page instead of hard-coding rates that can change.

Production checklist

  • Use deepseek-v4-flash or deepseek-v4-pro for new integrations.
  • Store DEEPSEEK_API_KEY securely outside source code.
  • Set base_url="https://api.deepseek.com" in the OpenAI Python client.
  • Choose thinking mode intentionally instead of relying on defaults for every route.
  • Keep reasoning_content separate from final user-facing output.
  • Use JSON Output only with explicit json instructions and validation.
  • Validate all Tool Calls arguments before executing functions.
  • Use https://api.deepseek.com/beta only when a documented beta feature requires it.
  • Log token usage and monitor unusually long prompts or outputs.
  • Handle 429, 500, and 503 with controlled retries and backoff.
  • Do not blindly retry 400, 401, 402, or 422.
  • Keep pricing links dynamic instead of copying prices into the article.

When this guide is not the right page

This guide is for Python developers using DeepSeek through the OpenAI-compatible Chat Completions workflow. It is not the best page for every use case.

FAQ

Is there an official DeepSeek Python SDK?

The officially documented Python path is to use the OpenAI Python SDK with DeepSeek’s API key, DeepSeek’s base URL, and DeepSeek model IDs. This is why developers often describe the setup as a DeepSeek Python SDK workflow.

Which Python package should I install for DeepSeek?

Install the openai package with pip install openai. Do not use an unofficial DeepSeek package unless the official DeepSeek documentation explicitly recommends it for your use case.

What base_url should I use for DeepSeek in Python?

Use base_url="https://api.deepseek.com" for normal OpenAI-compatible Chat Completions requests.

Should I use deepseek-v4-flash or deepseek-v4-pro?

Use deepseek-v4-flash for fast everyday workloads, summaries, extraction, routine coding help, and high-volume applications. Use deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, and agentic workflows.

Can I still use deepseek-chat or deepseek-reasoner?

They are legacy compatibility aliases. For new code, use deepseek-v4-flash or deepseek-v4-pro. The legacy aliases are scheduled for retirement after July 24, 2026, 15:59 UTC.

Does DeepSeek support JSON Output in Python?

Yes. Use response_format={"type": "json_object"}, include the word “json” in the prompt, provide an example shape, and validate the parsed result in Python.

Does DeepSeek support Tool Calls in Python?

Yes. The model can request a tool call, but your application executes the function. Validate arguments, run the function, append a tool message with the matching tool_call_id, and send the next request.

Where can I check DeepSeek API pricing?

Because DeepSeek API pricing can change, this article does not copy token prices. Check the official DeepSeek pricing page and Chat-Deep.ai’s pricing guide before making billing decisions.