DeepSeek Python SDK Guide: Use DeepSeek with Python

Quick answer: To use DeepSeek with Python, install the OpenAI Python SDK, create an OpenAI client with a DeepSeek API key, set base_url="https://api.deepseek.com", and call the current DeepSeek V4 model IDs: deepseek-v4-flash or deepseek-v4-pro.

Use deepseek-v4-flash for fast, economical chat, summaries, extraction, JSON Output, routine coding help, classification, and high-volume workloads. Use deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, agentic workflows, and higher-value production tasks.

Independent note: Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, the official DeepSeek app, the official DeepSeek developer platform, OpenAI, or the OpenAI Python SDK.

Last verified: April 24, 2026.

Current DeepSeek API snapshot

Current API model IDs: deepseek-v4-flash and deepseek-v4-pro

Current API generation: DeepSeek-V4 Preview

Base URL for OpenAI-compatible requests: https://api.deepseek.com

Context length: 1M tokens

Maximum output: 384K tokens

Thinking mode: supported on both current V4 API models

Non-thinking mode: supported on both current V4 API models

JSON Output: supported on both current V4 API models

Tool Calls: supported on both current V4 API models

FIM Completion: supported in non-thinking mode only

Legacy aliases: deepseek-chat and deepseek-reasoner currently route to deepseek-v4-flash non-thinking and thinking modes

Legacy alias retirement: DeepSeek says deepseek-chat and deepseek-reasoner will be retired after July 24, 2026, 15:59 UTC

This page is updated to match the current Chat-Deep.ai homepage, DeepSeek API guide, DeepSeek API pricing guide, OpenAI SDK with DeepSeek guide, Node.js TypeScript guide, Thinking Mode guide, Tool Calls guide, and Token Usage guide.

Quick answer: how to use DeepSeek with Python
Who this guide is for
What is the DeepSeek Python SDK?
Install the Python package
Set your DeepSeek API key safely
DeepSeek base_url explained
Choosing a current V4 model
Minimal DeepSeek Python example
Non-thinking mode example
Thinking Mode in Python
Streaming responses in Python
Async DeepSeek Python example
Multi-turn chat in Python
DeepSeek JSON Output in Python
DeepSeek Tool Calls in Python
Strict mode for Tool Calls
Chat Completions vs OpenAI Responses API
Error handling, retries, and timeouts
Token usage, context caching, and cost
List models and check balance
Legacy aliases: deepseek-chat and deepseek-reasoner
Common mistakes
Production checklist
When this guide is not the right page
FAQ
Official sources

Quick answer: how to use DeepSeek with Python

The current official DeepSeek quick start uses an OpenAI-compatible API format. In Python, the documented path is to use the OpenAI Python SDK with DeepSeek-specific configuration.

Install the OpenAI Python package: pip install openai.
Store a DeepSeek API key in a secure environment variable such as DEEPSEEK_API_KEY.
Create an OpenAI client with base_url="https://api.deepseek.com".
Call client.chat.completions.create().
Use model="deepseek-v4-flash" for most fast and economical workflows.
Use model="deepseek-v4-pro" for harder reasoning, coding, long-context, or agentic workflows.
Set extra_body={"thinking": {"type": "disabled"}} for non-thinking routes and extra_body={"thinking": {"type": "enabled"}} for thinking routes.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise Python assistant."},
        {"role": "user", "content": "Explain DeepSeek Python setup in one paragraph."},
    ],
    stream=False,
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

Important: DeepSeek’s official Python examples use Chat Completions through client.chat.completions.create(). Do not assume OpenAI’s client.responses.create() examples work with DeepSeek unless DeepSeek officially documents that endpoint for your use case.

Who this guide is for

Use this guide if you want to call DeepSeek from Python using the OpenAI Python SDK.
Use the OpenAI SDK with DeepSeek guide if you need both Python and Node.js migration details.
Use the DeepSeek Node.js TypeScript guide if your project is in JavaScript or TypeScript.
Use the DeepSeek API guide if you need a broader API overview, model selection, pricing links, and endpoint notes.
Use the Chat-Deep.ai browser chat if you only want to test prompts quickly without writing code.

What is the DeepSeek Python SDK?

Developers often search for “DeepSeek Python SDK,” but the officially documented Python path is to use the openai Python package configured with DeepSeek’s base URL and a DeepSeek API key.

In this article, DeepSeek Python SDK means: the OpenAI Python client, a DeepSeek API key, base_url="https://api.deepseek.com", and a current DeepSeek V4 model ID. Do not install or recommend an unofficial package such as pip install deepseek unless official DeepSeek documentation explicitly documents it for your use case.

This page intentionally focuses on Python. It does not replace dedicated pages for DeepSeek JSON Output, DeepSeek Tool Calls, DeepSeek Thinking Mode, DeepSeek Token Usage, or DeepSeek API Pricing.

Install the Python package

Install the OpenAI Python package from PyPI:

pip install openai

If you want to load variables from a local .env file during development, you can also install python-dotenv:

pip install python-dotenv

The OpenAI Python package provides synchronous and asynchronous clients. In this article, normal examples use OpenAI, and the async example uses AsyncOpenAI.

Set your DeepSeek API key safely

Keep your DeepSeek API key server-side. Never put a live API key in browser JavaScript, public mobile code, public GitHub repositories, logs, screenshots, support tickets, or client-side bundles.

macOS or Linux

export DEEPSEEK_API_KEY="<your_deepseek_api_key>"

Windows PowerShell

[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "<your_deepseek_api_key>", "User")

Optional local `.env` file

DEEPSEEK_API_KEY=<your_deepseek_api_key>

If you use a .env file, add it to .gitignore and never commit it:

from dotenv import load_dotenv

load_dotenv()

Then your application can read the key with os.environ["DEEPSEEK_API_KEY"].

DeepSeek `base_url` explained

The base_url tells the OpenAI Python client to send requests to DeepSeek instead of the default OpenAI endpoint. In Python, use base_url, not baseURL. The baseURL spelling is used in JavaScript and TypeScript examples.

`base_url`	When to use it	Important note
`https://api.deepseek.com`	Recommended default for normal DeepSeek API requests	Use this for Chat Completions, streaming, JSON Output, Tool Calls, thinking mode, token usage, and most production apps.
`https://api.deepseek.com/v1`	OpenAI compatibility path	The `/v1` path is for compatibility and is not a DeepSeek model version.
`https://api.deepseek.com/beta`	Only for documented beta features	Use only where official docs require it, such as strict tool schemas, Chat Prefix Completion, or FIM Completion.

For normal Python usage, start with https://api.deepseek.com. Switch to /beta only when a documented beta feature explicitly requires it.

Choosing a current V4 model

For new Python code, use deepseek-v4-flash or deepseek-v4-pro. Do not present deepseek-chat or deepseek-reasoner as the primary current model IDs.

Model	Use it for	Recommended mode	Notes
`deepseek-v4-flash`	Fast chat, summaries, extraction, JSON Output, routine coding help, classification, support bots, and cost-sensitive Python applications	Usually start with non-thinking mode	Best first model for most Python integrations and migration tests.
`deepseek-v4-pro`	Hard reasoning, complex coding, long-context analysis, tool planning, agentic workflows, and high-value production tasks	Usually use thinking mode for difficult tasks	Use when quality and reasoning matter more than lowest token price.
`deepseek-chat`	Legacy compatibility only	Routes to V4-Flash non-thinking mode	Scheduled for retirement after July 24, 2026, 15:59 UTC.
`deepseek-reasoner`	Legacy compatibility only	Routes to V4-Flash thinking mode	Scheduled for retirement after July 24, 2026, 15:59 UTC.

Minimal DeepSeek Python example

This is the simplest copy-paste DeepSeek API Python example using the OpenAI-compatible client and the current lower-cost V4 model.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful Python assistant."},
        {"role": "user", "content": "Explain DeepSeek Python setup in one paragraph."},
    ],
    stream=False,
    extra_body={"thinking": {"type": "disabled"}},
)

message = response.choices[0].message
print(message.content)

The required request fields are model and messages. The messages array contains the conversation so far, and the assistant’s generated answer is returned in response.choices[0].message.content for normal chat completions.

For endpoint-level details, read our Create a DeepSeek Chat Completion guide or the official DeepSeek Chat Completion API reference.

Non-thinking mode example with `deepseek-v4-flash`

Non-thinking mode is usually the better default for fast, simple, structured, or cost-sensitive routes.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise support assistant."},
        {"role": "user", "content": "Summarize this ticket in three bullet points."},
    ],
    max_tokens=600,
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

Use this pattern for ordinary chat, extraction, short summaries, classification, JSON-only tasks, and routine coding help where you do not need explicit reasoning behavior.

Thinking Mode in Python

DeepSeek V4 supports both thinking and non-thinking modes. Thinking mode can improve hard reasoning, complex coding, tool planning, long-context analysis, and agentic workflows. In Python with the OpenAI SDK, pass the thinking object through extra_body.

Thinking mode with `deepseek-v4-pro`

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Compare two Python API retry strategies and explain the tradeoffs."}
    ],
    reasoning_effort="high",
    max_tokens=2000,
    extra_body={"thinking": {"type": "enabled"}},
)

message = response.choices[0].message

reasoning = getattr(message, "reasoning_content", None)
if reasoning:
    # Keep reasoning separate from end-user output.
    # Store, inspect, or discard it according to your product policy.
    pass

print("Final answer:")
print(message.content)

Thinking mode with `deepseek-v4-flash`

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Plan a robust Python error-handling strategy for an API client."}
    ],
    reasoning_effort="high",
    max_tokens=1500,
    extra_body={"thinking": {"type": "enabled"}},
)

print(response.choices[0].message.content)

In thinking mode, reasoning_content is separate from the final content. The final user-facing answer is in content. For normal multi-turn chat without tool calls, you can carry the final answer forward. During thinking-mode tool-call loops, preserve the full assistant message because it may include reasoning_content and tool_calls.

For deeper thinking-mode behavior, read our DeepSeek Thinking Mode guide and the official DeepSeek Thinking Mode documentation.

Streaming responses in Python

Set stream=True to receive partial response chunks as they are generated. If you need final token accounting during streaming, set stream_options={"include_usage": True} and capture the final usage-bearing chunk.

Non-thinking streaming example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise Python assistant."},
        {"role": "user", "content": "Give me three Python tips for reliable API integrations."},
    ],
    stream=True,
    stream_options={"include_usage": True},
    extra_body={"thinking": {"type": "disabled"}},
)

final_usage = None

for chunk in stream:
    if getattr(chunk, "usage", None) is not None:
        final_usage = chunk.usage
        continue

    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta
    if getattr(delta, "content", None):
        print(delta.content, end="", flush=True)

if final_usage is not None:
    print()
    print("Usage:", final_usage)

Thinking-mode streaming example

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Explain why robust retry logic matters in Python API clients."}
    ],
    stream=True,
    stream_options={"include_usage": True},
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

reasoning_buffer = []
answer_buffer = []
final_usage = None

for chunk in stream:
    if getattr(chunk, "usage", None) is not None:
        final_usage = chunk.usage
        continue

    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta

    reasoning = getattr(delta, "reasoning_content", None)
    if reasoning:
        reasoning_buffer.append(reasoning)

    content = getattr(delta, "content", None)
    if content:
        answer_buffer.append(content)
        print(content, end="", flush=True)

final_answer = "".join(answer_buffer)

if final_usage is not None:
    print()
    print("Usage:", final_usage)

Most user-facing apps should display only final answer content unless they have a clear product policy for showing reasoning output. Keep reasoning and final answer buffers separate.

Async DeepSeek Python example

For high-concurrency Python applications, use AsyncOpenAI. The request shape is almost the same, but you call it with await.

import asyncio
import os
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

async def main() -> None:
    response = await client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "system", "content": "You are a helpful Python assistant."},
            {"role": "user", "content": "Explain async API clients in two sentences."},
        ],
        stream=False,
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(response.choices[0].message.content)

asyncio.run(main())

Use async when your server needs to handle many simultaneous API calls without blocking worker threads. Keep the same security, timeout, validation, and retry rules you use in synchronous code.

Multi-turn chat in Python

DeepSeek’s Chat Completions API is stateless. The server does not remember earlier turns for you. If you want multi-turn chat, your application must keep the conversation history and send it again with each request.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

messages = [
    {"role": "system", "content": "You are a helpful Python assistant."},
    {"role": "user", "content": "What is context caching in one sentence?"},
]

first_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

first_message = first_response.choices[0].message
print(first_message.content)

messages.append({
    "role": "assistant",
    "content": first_message.content,
})

messages.append({
    "role": "user",
    "content": "Now explain why repeated prefixes can matter for cost.",
})

second_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

print(second_response.choices[0].message.content)

For ordinary multi-turn chat, store and resend the assistant’s final content. For thinking-mode tool-call loops, preserve the full assistant message during the active loop because it may include reasoning_content and tool_calls.

For a deeper explanation, see the official DeepSeek Multi-round Conversation guide.

DeepSeek JSON Output in Python

DeepSeek JSON Output uses response_format={"type": "json_object"}. The prompt should explicitly include the word “json,” provide an example shape, and set max_tokens high enough to reduce truncation risk. Production code should parse and validate the result before using it.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {
            "role": "system",
            "content": (
                "Return only valid json with keys: "
                "title, summary, category, confidence. "
                "Do not include Markdown or explanations."
            ),
        },
        {
            "role": "user",
            "content": "Classify this text: DeepSeek Python SDK setup is simple.",
        },
    ],
    response_format={"type": "json_object"},
    max_tokens=500,
    extra_body={"thinking": {"type": "disabled"}},
)

choice = response.choices[0]
content = choice.message.content or ""

if choice.finish_reason == "length":
    raise RuntimeError("The JSON may be truncated. Increase max_tokens.")

if not content.strip():
    raise RuntimeError("The response content was empty. Retry with a clearer json prompt.")

try:
    data = json.loads(content)
except json.JSONDecodeError as exc:
    raise RuntimeError("The response was not valid JSON.") from exc

required = {"title", "summary", "category", "confidence"}
missing = required - set(data)

if missing:
    raise RuntimeError(f"Missing required keys: {missing}")

print(data)

DeepSeek’s official JSON Output documentation notes that empty content can occasionally occur. For production coding tools, add retry logic, validate parsed JSON, and show a safe fallback instead of assuming every response will be parseable.

For a full structured-output implementation guide, read our DeepSeek JSON Output guide and the official DeepSeek JSON Output documentation.

DeepSeek Tool Calls in Python

DeepSeek Tool Calls let the model request structured function calls. The model does not execute your function automatically. Your application validates the arguments, executes the function, appends a tool message with the matching tool_call_id, and sends another request.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

def get_weather(location: str) -> dict:
    # Replace this with your real weather service.
    return {
        "location": location,
        "temperature_c": 26,
        "condition": "clear",
    }

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city or region.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, for example Cairo, Egypt",
                    }
                },
                "required": ["location"],
            },
        },
    }
]

messages = [
    {"role": "user", "content": "What is the weather in Cairo, Egypt?"}
]

first = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    extra_body={"thinking": {"type": "disabled"}},
)

message = first.choices[0].message

if message.tool_calls:
    messages.append(message.model_dump(exclude_none=True))

    for tool_call in message.tool_calls:
        if tool_call.function.name != "get_weather":
            raise RuntimeError(f"Unexpected tool call: {tool_call.function.name}")

        try:
            arguments = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError as exc:
            raise RuntimeError("Tool arguments were not valid JSON.") from exc

        location = arguments.get("location")
        if not isinstance(location, str) or not location.strip():
            raise RuntimeError("Tool argument 'location' must be a non-empty string.")

        result = get_weather(location)

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result, ensure_ascii=False),
        })

    second = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        tools=tools,
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(second.choices[0].message.content)
else:
    print(message.content)

Validate tool arguments before execution, especially if a tool touches databases, payments, accounts, files, shell commands, repositories, or external APIs. For a dedicated implementation guide, read our DeepSeek Tool Calls article and the official DeepSeek Tool Calls documentation.

Thinking-mode Tool Calls in Python

For thinking-mode tool workflows, use deepseek-v4-pro when the task needs stronger planning or reasoning. During a thinking + tool-call loop, preserve the full assistant message so the model can continue the same reasoning process after tool results.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

def lookup_order_status(order_id: str) -> str:
    return json.dumps({
        "order_id": order_id,
        "status": "shipped",
        "eta": "2026-04-27",
    })

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order_status",
            "description": "Look up shipping status for an order ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The user's order ID.",
                    }
                },
                "required": ["order_id"],
            },
        },
    }
]

messages = [
    {"role": "user", "content": "Check order A123 and explain whether it will arrive this week."}
]

while True:
    response = client.chat.completions.create(
        model="deepseek-v4-pro",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        reasoning_effort="high",
        extra_body={"thinking": {"type": "enabled"}},
    )

    assistant_message = response.choices[0].message

    # Important: keep the full assistant message in a thinking + tool-call loop.
    messages.append(assistant_message.model_dump(exclude_none=True))

    if not assistant_message.tool_calls:
        print(assistant_message.content)
        break

    for tool_call in assistant_message.tool_calls:
        if tool_call.function.name != "lookup_order_status":
            raise RuntimeError(f"Unexpected tool call: {tool_call.function.name}")

        args = json.loads(tool_call.function.arguments or "{}")
        order_id = args.get("order_id")

        if not isinstance(order_id, str):
            raise RuntimeError("Invalid order_id.")

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": lookup_order_status(order_id),
        })

Strict mode for Tool Calls

Strict mode is a beta Tool Calls feature. DeepSeek’s official docs say strict mode requires base_url="https://api.deepseek.com/beta" and strict: true inside function definitions. Use strict mode when you need tighter schema adherence, but still validate all arguments before executing tools.

import os
from openai import OpenAI

strict_client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/beta",
)

strict_tools = [
    {
        "type": "function",
        "function": {
            "name": "create_invoice_draft",
            "strict": True,
            "description": "Create a draft invoice. This does not send the invoice.",
            "parameters": {
                "type": "object",
                "properties": {
                    "customer_id": {
                        "type": "string",
                        "description": "Internal CRM customer ID",
                    },
                    "amount_usd": {
                        "type": "number",
                        "description": "Invoice amount in USD",
                        "minimum": 0,
                    },
                    "memo": {
                        "type": "string",
                        "description": "Short invoice memo",
                    },
                },
                "required": ["customer_id", "amount_usd", "memo"],
                "additionalProperties": False,
            },
        },
    }
]

Even with strict mode, application-level validation should check types, permissions, ranges, account ownership, business rules, safety constraints, and authorization.

Chat Completions vs OpenAI Responses API

The OpenAI Python library includes methods for OpenAI’s own endpoints, but DeepSeek’s official examples document /chat/completions and use client.chat.completions.create(). DeepSeek compatibility does not automatically mean every OpenAI endpoint is supported by DeepSeek.

# Recommended for DeepSeek because it is documented by DeepSeek:
response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"thinking": {"type": "disabled"}},
)

# Do not assume this is supported by DeepSeek unless DeepSeek documents it:
# response = client.responses.create(...)

For DeepSeek Python code, follow the endpoints and parameters that DeepSeek documents.

Error handling, retries, and timeouts

The OpenAI Python SDK has its own error classes, retry behavior, request IDs, and timeout configuration. DeepSeek API behavior should still be verified against DeepSeek’s official docs, especially for status-code meanings.

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
    timeout=30.0,
    max_retries=2,
)

try:
    response = client.with_options(timeout=60.0).chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {"role": "user", "content": "Give me one Python API reliability tip."}
        ],
        stream=False,
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(response.choices[0].message.content)
    print("request_id:", getattr(response, "_request_id", None))

except openai.APIConnectionError as exc:
    print("Connection problem or timeout while reaching the API.")
    print(str(exc.__cause__) if exc.__cause__ else "No low-level cause available.")

except openai.RateLimitError as exc:
    print("429 rate limit or traffic-related throttling. Back off and retry later.")
    print("request_id:", getattr(exc, "request_id", None))

except openai.APIStatusError as exc:
    print(f"API returned status code: {exc.status_code}")
    print("request_id:", getattr(exc, "request_id", None))
    print("Response body:")
    print(exc.response)

Common DeepSeek API status-code meanings:

Status	Official meaning	What to check
400	Invalid request body format	Fix the request body format, message order, tool messages, or thinking-mode history.
401	Authentication fails	Check that you are using a valid DeepSeek API key with the DeepSeek base URL.
402	Insufficient balance	Check account balance and billing setup on the official DeepSeek platform.
422	Invalid parameters	Check unsupported or malformed request parameters.
429	Rate limit reached	Pace requests, reduce concurrency, queue work, and retry with backoff.
500	Server error	Retry after a brief wait.
503	Server overloaded	Retry later and consider graceful fallback.

Implementation note: Production clients should implement retry budgets, timeout handling, exponential backoff, and graceful fallback. Do not create aggressive infinite retry loops.

For a deeper troubleshooting reference, see our DeepSeek API Error Codes guide, plus the official DeepSeek Error Codes page.

Token usage, context caching, and cost

DeepSeek bills based on input and output tokens. The actual token count should be read from the API response usage object rather than estimated from character count alone.

usage = response.usage

if usage:
    print("prompt_tokens:", usage.prompt_tokens)
    print("completion_tokens:", usage.completion_tokens)
    print("total_tokens:", usage.total_tokens)

    cache_hit = getattr(usage, "prompt_cache_hit_tokens", 0) or 0
    cache_miss = getattr(usage, "prompt_cache_miss_tokens", 0) or 0
    reasoning_tokens = getattr(
        getattr(usage, "completion_tokens_details", None),
        "reasoning_tokens",
        0,
    ) or 0

    print("prompt_cache_hit_tokens:", cache_hit)
    print("prompt_cache_miss_tokens:", cache_miss)
    print("reasoning_tokens:", reasoning_tokens)

Current official V4 pricing is listed per 1M tokens and differs by model:

Model	Input cache hit	Input cache miss	Output
`deepseek-v4-flash`	$0.028 / 1M tokens	$0.14 / 1M tokens	$0.28 / 1M tokens
`deepseek-v4-pro`	$0.145 / 1M tokens	$1.74 / 1M tokens	$3.48 / 1M tokens

The request-level cost formula is:

request_cost =
(prompt_cache_hit_tokens / 1_000_000 * cache_hit_rate) +
(prompt_cache_miss_tokens / 1_000_000 * cache_miss_rate) +
(completion_tokens / 1_000_000 * output_rate)

Simple cost helper for Python

from decimal import Decimal

PRICING_PER_1M = {
    "deepseek-v4-flash": {
        "cache_hit_input": Decimal("0.028"),
        "cache_miss_input": Decimal("0.14"),
        "output": Decimal("0.28"),
    },
    "deepseek-v4-pro": {
        "cache_hit_input": Decimal("0.145"),
        "cache_miss_input": Decimal("1.74"),
        "output": Decimal("3.48"),
    },
}

def estimate_deepseek_cost_usd(model: str, usage) -> Decimal:
    rates = PRICING_PER_1M[model]

    cache_hit = Decimal(getattr(usage, "prompt_cache_hit_tokens", 0) or 0)
    cache_miss = Decimal(getattr(usage, "prompt_cache_miss_tokens", 0) or 0)
    output = Decimal(getattr(usage, "completion_tokens", 0) or 0)

    return (
        cache_hit / Decimal("1000000") * rates["cache_hit_input"]
        + cache_miss / Decimal("1000000") * rates["cache_miss_input"]
        + output / Decimal("1000000") * rates["output"]
    )

model = "deepseek-v4-flash"

response = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "user", "content": "Explain DeepSeek Python SDK setup in five bullets."}
    ],
    extra_body={"thinking": {"type": "disabled"}},
)

cost = estimate_deepseek_cost_usd(model, response.usage)
print(f"estimated_request_cost: ${cost:.8f}")

Context caching is enabled by default in the DeepSeek API. Only repeated prefix portions can trigger cache hits, and the cache is best-effort rather than guaranteed. Track prompt_cache_hit_tokens and prompt_cache_miss_tokens instead of assuming every repeated request gets cache-hit pricing.

For planning, use our DeepSeek API pricing page. For deeper usage tracking, read our DeepSeek Token Usage guide and the official DeepSeek Token & Token Usage page.

List models and check balance

You can inspect available API models through GET /models and check account balance through GET /user/balance. These are useful for internal diagnostics, dashboards, and pre-launch checks.

import os
import requests
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

models = client.models.list()
for model in models.data:
    print(model.id)

balance_response = requests.get(
    "https://api.deepseek.com/user/balance",
    headers={"Authorization": f"Bearer {os.environ['DEEPSEEK_API_KEY']}"},
    timeout=30,
)

balance_response.raise_for_status()
print(balance_response.json())

See the official List Models and Get User Balance references for the latest response shape.

Legacy aliases: `deepseek-chat` and `deepseek-reasoner`

The older names deepseek-chat and deepseek-reasoner are now legacy compatibility aliases. They should not be the primary model names in new Python examples, SDK migration guides, pricing pages, or developer tutorials.

Legacy alias	Current compatibility behavior	Recommended replacement
`deepseek-chat`	Routes to `deepseek-v4-flash` non-thinking mode	`deepseek-v4-flash` with thinking disabled
`deepseek-reasoner`	Routes to `deepseek-v4-flash` thinking mode	`deepseek-v4-flash` or `deepseek-v4-pro` with thinking enabled

Migration rule: replace old examples that use model="deepseek-chat" with model="deepseek-v4-flash", and replace reasoning-heavy examples that use model="deepseek-reasoner" with model="deepseek-v4-pro" plus thinking enabled.

Common mistakes

Using old model names in new code: Use deepseek-v4-flash and deepseek-v4-pro directly.
Using an OpenAI API key with DeepSeek base_url: Use a DeepSeek API key for DeepSeek requests.
Using a DeepSeek API key with the default OpenAI endpoint: Set base_url="https://api.deepseek.com".
Using baseURL in Python: Python uses base_url. JavaScript and TypeScript use baseURL.
Treating /v1 as a DeepSeek model version: It is an OpenAI compatibility path, not a model version.
Copying OpenAI Responses API examples into DeepSeek: DeepSeek’s official examples use Chat Completions. Do not assume client.responses.create() works unless DeepSeek documents it.
Putting secret keys in browser code: Keep DeepSeek API keys on the server side.
Assuming app/web behavior equals API behavior: Use official DeepSeek API docs for developer integrations.
Ignoring finish_reason: Check for length, tool_calls, and other stop reasons.
Passing reasoning text into normal history incorrectly: Use final content for normal conversation history unless the active thinking + tool-call loop requires the full assistant message.
Forgetting to validate JSON or tool arguments: Parse and validate before using structured output or executing tools.
Treating pricing as permanent: Re-check official pricing before launch.

Production checklist

Use environment variables or a secrets manager for DEEPSEEK_API_KEY.
Rotate keys when needed and remove old keys from deployment environments.
Use deepseek-v4-flash or deepseek-v4-pro in new code.
Use base_url="https://api.deepseek.com" for normal production requests.
Use https://api.deepseek.com/beta only for documented beta features.
Set explicit timeouts.
Configure retries with backoff and a retry budget.
Log request IDs and status codes where available.
Handle 400, 401, 402, 422, 429, 500, and 503 with clear behavior.
Monitor token usage and response usage fields.
Track prompt_cache_hit_tokens and prompt_cache_miss_tokens.
Check account balance before production workloads.
Avoid browser-side API keys.
Validate JSON Output before using it in automation.
Validate tool-call arguments before executing functions.
Use async clients for high-concurrency Python apps.
Test prompts with representative inputs, not only toy examples.
Verify pricing, model names, context length, output limits, and feature support before launch.

When this guide is not the right page

If you only want to try prompts, use the Chat-Deep.ai browser chat.
If you need Node.js or TypeScript, use the DeepSeek Node.js TypeScript guide.
If you need cross-language SDK migration, use the OpenAI SDK with DeepSeek guide.
If you need LangChain orchestration, use the DeepSeek LangChain integration guide.
If you need local/offline models, use the DeepSeek Local vs API guide.
If you need official billing, account management, or API keys, use the official DeepSeek platform.

FAQ

What is the DeepSeek Python SDK?

The phrase “DeepSeek Python SDK” usually refers to using DeepSeek from Python through the OpenAI Python client configured with a DeepSeek API key, DeepSeek base_url, and a current DeepSeek V4 model ID.

Does DeepSeek have an official Python SDK?

DeepSeek’s official quick start documents Python usage through the OpenAI Python client configured with DeepSeek’s base URL. This guide uses that documented approach instead of recommending an unofficial DeepSeek-specific Python package.

How do I install the DeepSeek Python SDK?

Install the OpenAI Python package with pip install openai, then create an OpenAI client with your DeepSeek API key and base_url="https://api.deepseek.com".

What `base_url` should I use for DeepSeek in Python?

Use https://api.deepseek.com for normal API requests. https://api.deepseek.com/v1 is also supported for OpenAI compatibility, but /v1 is not a model version.

Should I use `deepseek-v4-flash` or `deepseek-v4-pro`?

Use deepseek-v4-flash for fast, lower-cost workflows and deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, and agentic workflows.

Should I still use `deepseek-chat` or `deepseek-reasoner`?

Not for new code. They are legacy compatibility aliases. deepseek-chat currently routes to deepseek-v4-flash non-thinking mode, and deepseek-reasoner currently routes to deepseek-v4-flash thinking mode.

Can I use the OpenAI Python client with DeepSeek?

Yes. DeepSeek’s official quick start says the API uses an OpenAI-compatible format, so Python users can use the OpenAI Python client by changing the API key, base URL, and model ID.

Can I use the OpenAI Responses API with DeepSeek?

DeepSeek’s official examples use Chat Completions. Do not assume OpenAI Responses API examples work with DeepSeek unless DeepSeek documents support for your use case.

How do I stream DeepSeek responses in Python?

Set stream=True in client.chat.completions.create() and iterate over the returned chunks. Read delta.content for visible text and handle empty chunks safely. If you need usage, request stream_options={"include_usage": True}.

How do I use JSON Output with DeepSeek in Python?

Set response_format={"type": "json_object"}, explicitly ask for json in the prompt, provide an example JSON shape, set enough max_tokens, and validate the parsed JSON in your code.

How do I use Tool Calls with DeepSeek in Python?

Define a tools array, let the model return tool_calls, validate the arguments, execute the function in your application, append a tool message with the matching tool_call_id, and send another request.

How do I use Thinking Mode in Python?

Use deepseek-v4-flash or deepseek-v4-pro and pass extra_body={"thinking": {"type": "enabled"}}. For harder reasoning, use deepseek-v4-pro with reasoning_effort="high" or reasoning_effort="max".

Why am I getting a 401 error?

A 401 error means authentication failed. Check that you are using a valid DeepSeek API key, that it is loaded into DEEPSEEK_API_KEY, and that your client is pointed at the DeepSeek base URL.

Why am I getting a 402 error?

A 402 error means insufficient balance. Check your DeepSeek account balance and billing setup before running production workloads.

Can I call DeepSeek directly from browser JavaScript?

No. Do not put a secret DeepSeek API key in public browser code. Use a server-side route, API proxy, worker, or backend service that keeps the key private.

Is Chat-Deep.ai the official DeepSeek website?

No. Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, the official DeepSeek app, or the official DeepSeek developer platform.

Conclusion

The practical DeepSeek Python setup is straightforward: install the OpenAI Python client, set base_url="https://api.deepseek.com", use a DeepSeek API key, and call client.chat.completions.create() with deepseek-v4-flash or deepseek-v4-pro.

For production, go beyond the minimal example. Add secure key storage, timeouts, retries, usage monitoring, JSON validation, tool argument validation, request-id logging, and clear handling for status codes such as 400, 401, 402, 422, 429, 500, and 503.

Continue with the DeepSeek API guide for broader setup, the OpenAI SDK with DeepSeek guide for cross-language migration, and the official DeepSeek documentation for final production decisions.

Official sources and last verified

Last verified: April 24, 2026. DeepSeek model names, pricing, feature support, output limits, endpoint behavior, and legacy alias behavior can change. Use the official sources below before shipping production code.

Table of contents

Quick answer: how to use DeepSeek with Python

Who this guide is for

What is the DeepSeek Python SDK?

Install the Python package

Set your DeepSeek API key safely

macOS or Linux

Windows PowerShell

Optional local .env file

DeepSeek base_url explained

Choosing a current V4 model

Minimal DeepSeek Python example

Non-thinking mode example with deepseek-v4-flash

Thinking Mode in Python

Thinking mode with deepseek-v4-pro

Thinking mode with deepseek-v4-flash

Streaming responses in Python

Non-thinking streaming example

Thinking-mode streaming example

Async DeepSeek Python example

Multi-turn chat in Python

DeepSeek JSON Output in Python

DeepSeek Tool Calls in Python

Thinking-mode Tool Calls in Python

Strict mode for Tool Calls

Chat Completions vs OpenAI Responses API

Error handling, retries, and timeouts

Token usage, context caching, and cost

Simple cost helper for Python

List models and check balance

Legacy aliases: deepseek-chat and deepseek-reasoner

Common mistakes

Production checklist

When this guide is not the right page

FAQ

What is the DeepSeek Python SDK?

Does DeepSeek have an official Python SDK?

How do I install the DeepSeek Python SDK?

What base_url should I use for DeepSeek in Python?

Should I use deepseek-v4-flash or deepseek-v4-pro?

Should I still use deepseek-chat or deepseek-reasoner?

Can I use the OpenAI Python client with DeepSeek?

Can I use the OpenAI Responses API with DeepSeek?

How do I stream DeepSeek responses in Python?

How do I use JSON Output with DeepSeek in Python?

How do I use Tool Calls with DeepSeek in Python?

How do I use Thinking Mode in Python?

Why am I getting a 401 error?

Why am I getting a 402 error?

Can I call DeepSeek directly from browser JavaScript?

Is Chat-Deep.ai the official DeepSeek website?

Conclusion

Official sources and last verified

Related Posts

Use the OpenAI SDK with DeepSeek: Python, Node.js, Streaming, JSON Output & Tool Calls

DeepSeek JSON Output: Return Valid JSON with the DeepSeek API

DeepSeek Node.js TypeScript Guide: Chat Completions, Streaming, JSON & Tools

DeepSeek LlamaIndex Integration: Python RAG Setup

DeepSeek LangChain Integration: Python, TypeScript, RAG, Tools, and Structured Output

DeepSeek API Token Usage Explained

Optional local `.env` file

DeepSeek `base_url` explained

Non-thinking mode example with `deepseek-v4-flash`

Thinking mode with `deepseek-v4-pro`

Thinking mode with `deepseek-v4-flash`

Legacy aliases: `deepseek-chat` and `deepseek-reasoner`

What `base_url` should I use for DeepSeek in Python?

Should I use `deepseek-v4-flash` or `deepseek-v4-pro`?

Should I still use `deepseek-chat` or `deepseek-reasoner`?