DeepSeek Python SDK Guide: Use DeepSeek with Python

Last verified: July 11, 2026

Quick answer

The practical way to use the DeepSeek Python SDK today is to install the official openai Python package and configure it with DeepSeek’s API key and OpenAI-compatible base_url. DeepSeek’s own API documentation says the API is compatible with the OpenAI and Anthropic API formats, and its Python quickstart uses from openai import OpenAI with base_url="https://api.deepseek.com".

In other words, for most Python projects, you do not need a separate DeepSeek-branded Python package. You install:

pip install openai

Then create a client like this:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

That gives you a familiar OpenAI SDK-style interface for DeepSeek chat completions, streaming, JSON output, tool calls, async requests, retries, and production error handling.

Is there an official DeepSeek Python SDK?

As of July 11, 2026, the official DeepSeek documentation does not present a separate DeepSeek-branded Python SDK as the main integration path. Instead, DeepSeek documents an OpenAI-compatible API and shows Python examples using the official OpenAI Python client with DeepSeek’s base_url.

There are third-party Python packages with names such as deepseek, deepseek-sdk, or DeepSeek-specific wrappers. Some may be useful in specific projects, but they should be treated as unofficial unless DeepSeek’s official docs explicitly recommend them. For example, one GitHub project documents pip install deepseek-sdk, while a PyPI package named deepseek is maintained outside DeepSeek’s official documentation path.

For production work, start with the official OpenAI Python SDK configured for DeepSeek unless you have a specific reason to add another wrapper.

Option	Best for	Recommendation
`openai` Python SDK + DeepSeek `base_url`	Most Python apps, backend services, scripts, agent prototypes	Recommended default
Direct HTTP requests	Minimal dependencies, custom runtime environments	Good if you want full control
LangChain DeepSeek integration	Existing LangChain apps	Useful if your app already uses LangChain
Third-party DeepSeek wrappers	Specific convenience methods	Evaluate carefully; verify maintenance, model support, and security

Prerequisites

Before writing code, you need:

Python 3.9 or newer. The current OpenAI Python library states that it supports Python 3.9+ applications.
A DeepSeek API key from the DeepSeek platform. DeepSeek’s API reference uses bearer authentication and requires creating an API key before calling the API.
The openai Python package. OpenAI’s official SDK documentation shows installation with pip install openai.
A secure way to store DEEPSEEK_API_KEY, such as an environment variable or a secrets manager.

Create a project folder and virtual environment:

mkdir deepseek-python-demo
cd deepseek-python-demo

python -m venv .venv
source .venv/bin/activate

On Windows PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1

Install the SDK:

pip install --upgrade openai python-dotenv

python-dotenv is optional, but it is convenient for local development.

Store your DeepSeek API key safely

Never hard-code your DeepSeek API key in source code. Use an environment variable instead.

For macOS or Linux:

export DEEPSEEK_API_KEY="your_deepseek_api_key_here"

For Windows PowerShell:

setx DEEPSEEK_API_KEY "your_deepseek_api_key_here"

For local development, you can also create a .env file:

DEEPSEEK_API_KEY=your_deepseek_api_key_here

Add .env to .gitignore:

.env
.venv/
__pycache__/

This pattern keeps secrets out of your Git history and makes the same code usable in local, staging, and production environments.

Configure the DeepSeek Python client

Create a file named deepseek_client.py:

import os
from openai import OpenAI


def create_deepseek_client() -> OpenAI:
    return OpenAI(
        api_key=os.environ["DEEPSEEK_API_KEY"],
        base_url="https://api.deepseek.com",
        timeout=60.0,
        max_retries=2,
    )

DeepSeek’s current official OpenAI-format base URL is:

https://api.deepseek.com

Use that value unless the official DeepSeek documentation changes. DeepSeek’s quickstart and models page both list https://api.deepseek.com as the OpenAI-format base URL.

Your first DeepSeek chat completion in Python

Create main.py:

from deepseek_client import create_deepseek_client

client = create_deepseek_client()

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {
            "role": "system",
            "content": "You are a concise Python assistant.",
        },
        {
            "role": "user",
            "content": "Write a Python function that converts a title into a URL slug.",
        },
    ],
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

Run it:

python main.py

This example uses deepseek-v4-flash, one of the model IDs currently listed in DeepSeek’s official API docs. The extra_body={"thinking": {"type": "disabled"}} setting makes the request use non-thinking mode, which is usually simpler for quick coding assistants, classification, rewriting, and basic app workflows. DeepSeek documents thinking as a request parameter with enabled and disabled values, and the OpenAI SDK examples pass DeepSeek-specific fields through extra_body.

Which DeepSeek model should you use?

DeepSeek currently lists two main API model IDs for OpenAI-format calls: deepseek-v4-flash and deepseek-v4-pro. The same official page says the older deepseek-chat and deepseek-reasoner names are scheduled for deprecation on 2026/07/24 15:59 UTC and correspond to compatibility behavior for deepseek-v4-flash.

Model	Practical use case	Current notes
`deepseek-v4-flash`	Default choice for high-volume chat, coding help, extraction, summarization, routing, and general app features	Lower current API pricing than Pro
`deepseek-v4-pro`	Harder reasoning, more complex coding, review-heavy workflows, and tasks where you want a stronger model path	Higher current API pricing than Flash
`deepseek-chat`	Legacy compatibility	Scheduled for deprecation on 2026/07/24
`deepseek-reasoner`	Legacy compatibility	Scheduled for deprecation on 2026/07/24

DeepSeek’s official pricing page currently lists both deepseek-v4-flash and deepseek-v4-pro with 1M context length, 384K maximum output, JSON output support, tool calls support, and different per-1M-token prices. It also says prices may vary and recommends checking the pricing page regularly for the latest information.

Current official pricing listed on the page at verification time:

Model	1M input tokens, cache hit	1M input tokens, cache miss	1M output tokens
`deepseek-v4-flash`	$0.0028	$0.14	$0.28
`deepseek-v4-pro`	$0.003625	$0.435	$0.87

Do not hard-code cost assumptions into your application. Store pricing assumptions in configuration, logs, or internal documentation that can be updated when DeepSeek changes its pricing.

Thinking mode vs non-thinking mode

DeepSeek V4 supports both thinking and non-thinking modes. According to the official docs, thinking mode is enabled by default, and the OpenAI-format control is {"thinking": {"type": "enabled"}} or {"thinking": {"type": "disabled"}}.

Use non-thinking mode when you need:

Lower latency for simple tasks.
Cleaner direct answers.
Classification, extraction, rewriting, or basic coding help.
Tool calls without handling reasoning content.

Use thinking mode when you need:

More deliberate reasoning.
Harder coding or debugging.
Multi-step planning.
Complex agent workflows.

Example with thinking enabled:

from deepseek_client import create_deepseek_client

client = create_deepseek_client()

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {
            "role": "user",
            "content": "Review this API design and identify the three most important risks.",
        }
    ],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

message = response.choices[0].message

print(message.content)

DeepSeek notes that thinking mode does not support temperature, top_p, presence_penalty, or frequency_penalty; passing those parameters may not trigger an error, but they will not take effect in thinking mode.

Stream DeepSeek responses in Python

Streaming is useful for chat interfaces, command-line assistants, and long outputs because users see tokens as they arrive instead of waiting for the full response.

from deepseek_client import create_deepseek_client

client = create_deepseek_client()

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {
            "role": "user",
            "content": "Explain Python decorators in five practical bullet points.",
        }
    ],
    stream=True,
    stream_options={"include_usage": True},
    extra_body={"thinking": {"type": "disabled"}},
)

for chunk in stream:
    if not chunk.choices:
        # This can happen when include_usage=True and the final usage chunk arrives.
        continue

    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

DeepSeek’s API reference says streaming sends partial message deltas using server-sent events and terminates the stream with data: [DONE]. It also documents stream_options.include_usage for returning token usage in an additional streamed chunk.

Use DeepSeek with async Python

Use the async client when your application already runs on asyncio, FastAPI, aiohttp, or another async framework.

Create async_main.py:

import asyncio
import os
from openai import AsyncOpenAI


async def main() -> None:
    client = AsyncOpenAI(
        api_key=os.environ["DEEPSEEK_API_KEY"],
        base_url="https://api.deepseek.com",
        timeout=60.0,
        max_retries=2,
    )

    response = await client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {
                "role": "user",
                "content": "Give me a compact checklist for reviewing a Python pull request.",
            }
        ],
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(response.choices[0].message.content)


if __name__ == "__main__":
    asyncio.run(main())

Run it:

python async_main.py

The official OpenAI Python library provides AsyncOpenAI, and its docs state that synchronous and asynchronous clients otherwise have the same functionality.

Request JSON output from DeepSeek

Use JSON output when your application needs structured data for parsing, validation, database writes, workflows, or API responses.

DeepSeek’s JSON Output guide says to set response_format to {"type": "json_object"}, include the word “json” in the system or user prompt, provide an example of the desired JSON format, and choose a reasonable max_tokens value to avoid truncated JSON.

import json
from deepseek_client import create_deepseek_client

client = create_deepseek_client()

system_prompt = """
You extract API task requirements.

Return only valid JSON in this format:
{
  "task": "short task name",
  "language": "programming language",
  "risk_level": "low | medium | high",
  "requires_database": true
}
"""

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": "Build a Python endpoint that accepts uploaded invoices and stores parsed totals in PostgreSQL. Return JSON.",
        },
    ],
    response_format={"type": "json_object"},
    max_tokens=300,
    extra_body={"thinking": {"type": "disabled"}},
)

data = json.loads(response.choices[0].message.content)

print(data["task"])
print(data["risk_level"])

Always validate the JSON against your own schema before using it in production. Valid JSON does not guarantee that the values are correct, safe, complete, or appropriate for your business rules.

Use DeepSeek tool calls in Python

Tool calls let the model request a function call, while your code performs the actual function. This is useful for application actions such as checking an order, searching a database, calling an internal API, retrieving weather, or creating a ticket.

DeepSeek’s docs describe tool calls as a way for the model to call external tools, and the API reference notes that the model may generate invalid JSON or hallucinate parameters, so you must validate tool arguments in your code before executing a function.

Here is a minimal safe pattern:

import json
from typing import Any

from deepseek_client import create_deepseek_client

client = create_deepseek_client()


def get_order_status(order_id: str) -> dict[str, Any]:
    # Replace this demo function with a real database or API lookup.
    demo_orders = {
        "A100": {"status": "shipped", "eta": "2026-07-14"},
        "B200": {"status": "processing", "eta": None},
    }
    return demo_orders.get(order_id, {"status": "not_found", "eta": None})


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_order_status",
            "description": "Get the shipping status for an order by order ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The order ID, such as A100 or B200.",
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False,
            },
        },
    }
]

messages = [
    {
        "role": "user",
        "content": "Can you check the shipping status for order A100?",
    }
]

first_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    extra_body={"thinking": {"type": "disabled"}},
)

assistant_message = first_response.choices[0].message

if not assistant_message.tool_calls:
    print(assistant_message.content)
else:
    messages.append(assistant_message.model_dump(exclude_none=True))

    for tool_call in assistant_message.tool_calls:
        if tool_call.function.name != "get_order_status":
            raise ValueError(f"Unsupported tool: {tool_call.function.name}")

        args = json.loads(tool_call.function.arguments)
        order_id = args.get("order_id")

        if not isinstance(order_id, str) or not order_id:
            raise ValueError("Invalid order_id from tool call")

        tool_result = get_order_status(order_id)

        messages.append(
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(tool_result),
            }
        )

    final_response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(final_response.choices[0].message.content)

Important production rule: treat model-generated tool arguments as untrusted input. Validate types, allowed values, authorization, tenant scope, rate limits, and business rules before executing any tool.

Keep multi-turn conversations stateless

DeepSeek’s /chat/completions API is stateless. The server does not remember previous conversation turns for you, so your application must send the relevant conversation history with each request.

A minimal multi-turn pattern looks like this:

from deepseek_client import create_deepseek_client

client = create_deepseek_client()

messages = [
    {
        "role": "system",
        "content": "You are a helpful assistant for Python API design.",
    },
    {
        "role": "user",
        "content": "What is a good folder structure for a FastAPI project?",
    },
]

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

assistant_message = response.choices[0].message
messages.append(assistant_message.model_dump(exclude_none=True))

messages.append(
    {
        "role": "user",
        "content": "Now adapt that structure for a project with background workers.",
    }
)

second_response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

print(second_response.choices[0].message.content)

In a real app, do not blindly send the entire conversation forever. Summarize old turns, trim irrelevant content, and store only the history needed to answer the next user request.

Handle errors, retries, and timeouts

The OpenAI Python SDK raises APIConnectionError for connection problems and APIStatusError for non-success HTTP responses. It also provides specific subclasses such as RateLimitError, and its docs say certain errors are retried two times by default with exponential backoff.

A practical DeepSeek error-handling pattern:

import logging

import openai
from deepseek_client import create_deepseek_client

logger = logging.getLogger(__name__)
client = create_deepseek_client()

try:
    response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=[
            {
                "role": "user",
                "content": "Summarize the key risks in a Python dependency upgrade.",
            }
        ],
        extra_body={"thinking": {"type": "disabled"}},
    )
    print(response.choices[0].message.content)

except openai.RateLimitError as exc:
    logger.warning("Rate limited by DeepSeek API. request_id=%s", getattr(exc, "request_id", None))
    raise

except openai.APIConnectionError as exc:
    logger.exception("Could not connect to the API: %s", exc.__cause__)
    raise

except openai.APIStatusError as exc:
    logger.error(
        "DeepSeek API returned status=%s request_id=%s response=%s",
        exc.status_code,
        getattr(exc, "request_id", None),
        exc.response,
    )
    raise

DeepSeek’s own error-code page documents common statuses, including 400 for invalid request format, 401 for authentication failure, 402 for insufficient balance, 422 for invalid parameters, 429 for rate limit reached, 500 for server error, and 503 for server overload.

Status	Likely cause	Practical fix
400	Invalid request body	Check message format, unsupported fields, and `extra_body` structure
401	Wrong or missing API key	Verify `DEEPSEEK_API_KEY` and the account used
402	Insufficient balance	Check billing balance before retrying
422	Invalid parameter	Check model ID, JSON mode, tool schema, and unsupported parameters
429	Too many requests or concurrency exceeded	Back off, queue requests, reduce concurrency, or request higher capacity
500	Server-side issue	Retry after a short wait
503	Server overloaded	Retry with backoff or route to a fallback provider

Understand rate limits and concurrency

DeepSeek’s rate-limit documentation currently focuses on concurrency limits by model and account. At verification time, it lists a concurrency limit of 2500 for deepseek-v4-flash and 500 for deepseek-v4-pro; it also says requests beyond the account-level concurrency limit receive HTTP 429.

For production systems:

Use a queue for background jobs.
Add per-user or per-tenant throttling.
Cap parallel requests per worker.
Use exponential backoff for 429, 500, and 503.
Track latency, error rate, and token usage by model.
Consider a fallback model provider for critical workflows.

DeepSeek also supports a user_id parameter for content-safety isolation, KVCache isolation, and scheduling isolation. When using the OpenAI SDK, DeepSeek’s docs say to pass user_id under extra_body; they also warn not to include user privacy information in user_id.

Example:

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Summarize this support ticket."}
    ],
    extra_body={
        "thinking": {"type": "disabled"},
        "user_id": "tenant_42_user_958",
    },
)

Use an internal opaque ID, not an email address, phone number, real name, or other personal data.

Track token usage and cost

DeepSeek bills based on input and output tokens, and the official pricing page lists prices per 1M tokens.

A typical response includes usage data:

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "user", "content": "Explain API idempotency in one paragraph."}
    ],
    extra_body={"thinking": {"type": "disabled"}},
)

usage = response.usage

print("Prompt tokens:", usage.prompt_tokens)
print("Completion tokens:", usage.completion_tokens)
print("Total tokens:", usage.total_tokens)

DeepSeek’s API reference also documents cache-related usage fields such as prompt_cache_hit_tokens and prompt_cache_miss_tokens.

Context caching is enabled by default for all users, according to DeepSeek’s context caching guide. The guide explains that subsequent requests with overlapping prefixes may hit the cache when the cached prefix has been persisted.

Practical ways to control cost:

Keep system prompts concise.
Reuse stable prefixes when possible.
Use deepseek-v4-flash for default traffic.
Route only harder tasks to deepseek-v4-pro.
Set max_tokens for bounded outputs.
Monitor cache-hit and cache-miss tokens.
Log cost by feature, tenant, and model.
Avoid sending unnecessary full conversation history.

Use DeepSeek with LangChain

If your application already uses LangChain, you can use LangChain’s DeepSeek integration instead of calling the OpenAI SDK directly. LangChain’s documentation says the integration lives in the langchain-deepseek package and requires a DeepSeek account, API key, and DEEPSEEK_API_KEY environment variable.

Install:

pip install -U langchain-deepseek

Example:

from langchain_deepseek import ChatDeepSeek

llm = ChatDeepSeek(
    model="deepseek-v4-flash",
    temperature=0,
    max_retries=2,
)

response = llm.invoke("Give me a three-point checklist for validating JSON API input.")

print(response.content)

Use LangChain when you need chains, retrievers, agents, memory abstractions, or LangChain-native structured workflows. For a simple DeepSeek API Python integration, the direct OpenAI SDK approach is usually easier to debug and has fewer moving parts.

Also check model names against the current DeepSeek official docs. Some framework examples may lag behind DeepSeek’s latest model IDs or still show legacy names.

Common DeepSeek Python SDK mistakes

1. Installing the wrong package first

Do not start with a random deepseek or deepseek-sdk package just because the keyword says “DeepSeek Python SDK.” DeepSeek’s official quickstart shows the OpenAI SDK path.

Use:

pip install openai

Then configure:

OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

2. Using a deprecated model name

Avoid starting new projects with deepseek-chat or deepseek-reasoner. DeepSeek’s docs state that those names are scheduled for deprecation on 2026/07/24 and correspond to compatibility behavior for deepseek-v4-flash.

Use current model IDs:

model="deepseek-v4-flash"

or:

model="deepseek-v4-pro"

3. Forgetting that thinking mode is enabled by default

DeepSeek documents thinking mode as enabled by default. If your app expects a direct answer and does not need reasoning mode, explicitly disable it.

extra_body={"thinking": {"type": "disabled"}}

4. Using unsupported parameters in thinking mode

If you enable thinking mode, do not rely on temperature, top_p, presence_penalty, or frequency_penalty. DeepSeek says these parameters do not take effect in thinking mode.

5. Assuming the server remembers conversation history

The chat completions API is stateless, so your app must send the relevant message history on each request.

6. Trusting tool-call arguments without validation

DeepSeek’s API reference warns that tool-call arguments may not always be valid JSON and may include hallucinated parameters. Validate arguments before executing any tool.

7. Logging sensitive user data

Avoid logging raw prompts, API keys, personal data, authentication tokens, database records, or tool-call payloads that contain sensitive information. Log request IDs, model names, latency, status codes, and token counts instead.

Production checklist

Before shipping a DeepSeek Python integration, verify the following:

You use openai with base_url="https://api.deepseek.com".
DEEPSEEK_API_KEY is loaded from environment variables or a secrets manager.
No API keys are stored in Git, logs, notebooks, or frontend code.
You use current model IDs from the official DeepSeek docs.
You explicitly choose thinking or non-thinking mode.
You set reasonable timeouts.
You understand default retry behavior and add app-level backoff where needed.
You handle 400, 401, 402, 422, 429, 500, and 503.
You validate JSON output before using it.
You validate tool-call arguments before executing tools.
You trim or summarize conversation history.
You monitor token usage, latency, errors, and cost.
You test fallback behavior for outages or overload.
You avoid storing personal data in user_id.
You re-check official DeepSeek docs before major releases or migrations.

FAQ

What is the DeepSeek Python SDK?

In practice, “DeepSeek Python SDK” usually means using the official OpenAI Python SDK configured for DeepSeek’s OpenAI-compatible API. DeepSeek’s official Python quickstart uses from openai import OpenAI with base_url="https://api.deepseek.com".

Should I install `openai` or `deepseek-sdk`?

Install openai first unless DeepSeek’s official documentation changes. Third-party DeepSeek wrappers exist, but DeepSeek’s current official quickstart path uses the OpenAI SDK.

What is the correct DeepSeek `base_url` for Python?

The current official OpenAI-format base URL is:

base_url="https://api.deepseek.com"

DeepSeek lists that value in its quickstart and models documentation.

Which model should I use first?

Start with deepseek-v4-flash for general app traffic, then test deepseek-v4-pro for harder tasks where output quality matters more than cost. Both model IDs are currently listed in the official DeepSeek docs.

Can I still use `deepseek-chat` or `deepseek-reasoner`?

Avoid them for new projects. DeepSeek says deepseek-chat and deepseek-reasoner are scheduled for deprecation on 2026/07/24 15:59 UTC.

Does DeepSeek support streaming in Python?

Yes. Use stream=True with client.chat.completions.create(...) and iterate through the returned chunks. DeepSeek’s API reference documents streaming partial message deltas using server-sent events.

Does DeepSeek support JSON output?

Yes. Set response_format={"type": "json_object"} and explicitly instruct the model to return JSON in your prompt. DeepSeek’s JSON Output guide gives these requirements and notes that max_tokens should be set reasonably to avoid truncated JSON.

Does DeepSeek support tool calls?

Yes. DeepSeek’s model details page lists tool calls as supported for deepseek-v4-flash and deepseek-v4-pro, and the tool calls guide provides Python examples.

Can I use DeepSeek with async Python?

Yes. Use AsyncOpenAI with the same DeepSeek base_url. The OpenAI Python SDK documents AsyncOpenAI and says the async and sync clients have otherwise identical functionality.

Can I use DeepSeek with LangChain?

Yes. LangChain provides a langchain-deepseek integration package. Use it when you already need LangChain’s orchestration features; otherwise, the direct OpenAI SDK approach is simpler.