DeepSeek Chat Completions API

Last verified against official DeepSeek API documentation: April 27, 2026.

Current API status: DeepSeek’s POST /chat/completions endpoint documents deepseek-v4-flash and deepseek-v4-pro as the current V4 chat completion model IDs.

The older names deepseek-chat and deepseek-reasoner are legacy compatibility aliases. During the transition period, deepseek-chat maps to DeepSeek-V4-Flash non-thinking mode, while deepseek-reasoner maps to DeepSeek-V4-Flash thinking mode. DeepSeek says these aliases are scheduled to become inaccessible after July 24, 2026, 15:59 UTC.

This guide focuses on integration behavior, request structure, model selection, limits, and implementation patterns.

The DeepSeek Chat Completions API is the OpenAI-compatible interface for building chatbots, copilots, assistants, extraction tools, coding agents, and backend automation with DeepSeek models. You send a model and a messages array to POST /chat/completions, then receive either a normal chat completion object or streamed Server-Sent Event chunks.

This guide covers the current V4 model names, request fields, message roles, thinking mode, streaming, JSON Output, tool calls, strict mode, Chat Prefix Completion, stateless multi-turn conversations, usage fields, context caching, common errors, and practical production patterns.

Independent site notice: Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, chat.deepseek.com, the official DeepSeek app, or the official DeepSeek developer platform. For production decisions, verify current model names, limits, feature support, deprecation notices, and service status in the official DeepSeek documentation.

Contents

  1. Quick Answer
  2. Quick Reference
  3. Current DeepSeek Chat Completion Models
  4. Endpoint, Base URLs and Authentication
  5. Minimal API Examples
  6. Request Fields and Message Roles
  7. Thinking Mode
  8. Streaming Responses
  9. JSON Output
  10. Tool Calls
  11. Strict Tool Mode Beta
  12. Chat Prefix Completion Beta
  13. Multi-turn Conversations Are Stateless
  14. Response Object and Usage Fields
  15. Context Caching
  16. Errors, Rate Limits and Keep-alives
  17. Security and Production Best Practices
  18. Migration from Legacy Aliases
  19. FAQ
  20. Official Sources

Quick Answer

Use POST https://api.deepseek.com/chat/completions for OpenAI-compatible DeepSeek chat requests. At the minimum, the JSON body needs:

  • model — normally deepseek-v4-flash or deepseek-v4-pro.
  • messages — an array containing at least one chat message.

Choose deepseek-v4-flash for fast, efficient, high-volume chat and automation. Choose deepseek-v4-pro for harder reasoning, coding, long-context analysis, and agentic workflows. For simple responses, disable thinking mode. For reasoning-heavy work, enable thinking mode and set reasoning_effort to high or max.

Quick Reference

ItemCurrent Official Detail
EndpointPOST /chat/completions
OpenAI-compatible base URLhttps://api.deepseek.com
Anthropic-compatible base URLhttps://api.deepseek.com/anthropic
Required request fieldsmodel and messages
Current V4 model IDsdeepseek-v4-flash and deepseek-v4-pro
Legacy aliasesdeepseek-chat and deepseek-reasoner
Legacy alias retirementAfter July 24, 2026, 15:59 UTC
Thinking modeSupported; default is enabled
Thinking efforthigh or max
Context length1M tokens for current V4 API models
Supported chat featuresStreaming, JSON Output, tool calls, strict tool mode Beta, and Chat Prefix Completion Beta

Current DeepSeek Chat Completion Models

The safest current model IDs for new DeepSeek Chat Completions integrations are deepseek-v4-flash and deepseek-v4-pro. The legacy aliases still exist for compatibility during the transition period, but they should not be treated as long-term model IDs.

Model IDStatusBest FitImplementation Note
deepseek-v4-flashCurrent V4 API modelFast chat, summarization, extraction, routing, support assistants, and high-volume workflows.Good default starting point for most applications.
deepseek-v4-proCurrent V4 API modelAdvanced reasoning, coding, long-context analysis, complex agents, and high-value tasks.Use when answer quality, synthesis, or reasoning depth matters more than speed.
deepseek-chatLegacy compatibility aliasExisting integrations that still depend on the older alias.Currently maps to V4-Flash non-thinking mode during the transition period.
deepseek-reasonerLegacy compatibility aliasExisting reasoning-mode integrations that still depend on the older alias.Currently maps to V4-Flash thinking mode during the transition period.

Endpoint, Base URLs and Authentication

For direct HTTP requests, call:

POST https://api.deepseek.com/chat/completions

For OpenAI-compatible SDKs, use:

base_url = "https://api.deepseek.com"

For Anthropic-compatible SDKs and tools, use:

https://api.deepseek.com/anthropic

Authentication uses a DeepSeek API key as a Bearer token:

Authorization: Bearer YOUR_DEEPSEEK_API_KEY

Keep API keys server-side. Do not place them in browser JavaScript, mobile app bundles, public repositories, screenshots, analytics logs, or client-visible configuration.

Minimal API Examples

Minimal cURL Example

This example uses deepseek-v4-flash with thinking mode disabled for a straightforward chat response:

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain DeepSeek Chat Completions in one paragraph."}
    ],
    "thinking": {"type": "disabled"},
    "stream": false
  }'

Reasoning cURL Example

For harder tasks, use deepseek-v4-pro, enable thinking mode, and choose a reasoning effort:

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "system", "content": "You are a careful technical assistant."},
      {"role": "user", "content": "Design a safe retry strategy for an API client."}
    ],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high",
    "stream": false
  }'

Python Example with the OpenAI SDK

DeepSeek supports OpenAI-compatible SDK usage. In the OpenAI Python SDK, pass the DeepSeek-specific thinking object through extra_body.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Give me three DeepSeek API integration tips."},
    ],
    stream=False,
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

Node.js Example with the OpenAI SDK

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY,
  baseURL: "https://api.deepseek.com",
});

async function main() {
  const completion = await client.chat.completions.create({
    model: "deepseek-v4-flash",
    messages: [
      { role: "system", content: "You are a concise assistant." },
      { role: "user", content: "Give me three DeepSeek Chat Completions tips." }
    ],
    thinking: { type: "disabled" },
    stream: false,
  });

  console.log(completion.choices[0].message.content);
}

main();

Request Fields and Message Roles

The request body must include model and messages. The messages array must contain at least one message and can use the roles system, user, assistant, and tool.

Common Request Fields

FieldRequired?PurposeExample
modelYesThe model ID to call.deepseek-v4-flash
messagesYesThe current conversation history.[{"role":"user","content":"Hello"}]
thinkingNoEnables or disables thinking mode.{"type":"enabled"}
reasoning_effortNoControls reasoning effort when thinking is enabled.high or max
streamNoReturns partial deltas over Server-Sent Events.true
max_tokensNoCaps generated output tokens.1024
stopNoStops generation at one or more sequences.["\nEND"]
response_formatNoRequests text or JSON output.{"type":"json_object"}
toolsNoDefines function tools the model may call.Array of function definitions
tool_choiceNoControls whether or which tool is called.none, auto, required, or a named function

Message Roles

RoleWhen to Use ItImportant Fields
systemSet behavior, scope, tone, or task rules.role, content
userSend the user request or application input.role, content
assistantPreserve prior assistant messages, tool-call proposals, or a Beta prefix.role, nullable content, optional tool_calls, optional reasoning_content, optional prefix
toolReturn the result of a function executed by your application.role, content, tool_call_id

Thinking Mode

DeepSeek V4 chat models support thinking and non-thinking modes. Thinking mode can return reasoning_content before the final content. The official default is enabled, so disable it explicitly when you want a simpler fast chat response.

Disable thinking mode:

{
  "thinking": {"type": "disabled"}
}

Enable thinking mode with effort control:

{
  "thinking": {"type": "enabled"},
  "reasoning_effort": "high"
}

DeepSeek documents high and max as reasoning effort values. For compatibility, low and medium map to high, while xhigh maps to max.

In thinking mode, DeepSeek says temperature, top_p, presence_penalty, and frequency_penalty do not affect behavior. Passing those parameters for compatibility may not raise an error, but they should not be treated as thinking-mode controls.

Thinking Mode in Multi-turn and Tool-call Flows

If a thinking-mode assistant message does not involve tool calls, previously returned reasoning_content does not need to be included in the next request and is ignored if sent. If the assistant message does involve tool calls, the relevant reasoning_content must be passed back in subsequent requests for that tool-call flow.

Streaming Responses

Set stream to true to receive partial message deltas over Server-Sent Events. The stream ends with data: [DONE]. In streaming mode, read choices[0].delta instead of waiting for a final choices[0].message.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Give me three deployment tips."},
    ],
    stream=True,
    extra_body={"thinking": {"type": "disabled"}},
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if getattr(delta, "content", None):
        print(delta.content, end="")

When stream_options includes {"include_usage": true}, DeepSeek streams one additional usage chunk before [DONE]. That chunk has an empty choices array and contains request-level usage data.

JSON Output

Use JSON Output when your application needs machine-readable structured data. Set response_format to {"type":"json_object"}, but also instruct the model to produce JSON in the prompt. DeepSeek recommends including the word “json”, providing an example of the desired shape, and setting max_tokens high enough to avoid truncation.

import json
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {
            "role": "system",
            "content": (
                "Return valid json only. "
                "Use this shape: {\"question\": string, \"answer\": string}."
            ),
        },
        {
            "role": "user",
            "content": "Convert this into json: What is the capital of Egypt? Cairo.",
        },
    ],
    response_format={"type": "json_object"},
    max_tokens=256,
    extra_body={"thinking": {"type": "disabled"}},
)

data = json.loads(response.choices[0].message.content)
print(data)

Always parse and validate the returned JSON before using it in business logic. If finish_reason is length, the JSON may be incomplete even when JSON mode is enabled.

Tool Calls

Tool calls let the model propose function calls, but the model does not execute your real functions. Your application defines available tools, the model may return tool_calls, your code executes the selected function, and then your app sends a tool message containing the result and the matching tool_call_id.

import json
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_status",
            "description": "Get delivery status for an order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer order ID"
                    }
                },
                "required": ["order_id"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Where is order A12345?"}
]

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    tools=tools,
    tool_choice="auto",
    extra_body={"thinking": {"type": "disabled"}},
)

assistant_message = response.choices[0].message
messages.append(assistant_message)

if assistant_message.tool_calls:
    call = assistant_message.tool_calls[0]
    args = json.loads(call.function.arguments)

    # Your application executes the real function here.
    tool_result = f"Order {args['order_id']} is out for delivery."

    messages.append({
        "role": "tool",
        "tool_call_id": call.id,
        "content": tool_result,
    })

    final_response = client.chat.completions.create(
        model="deepseek-v4-flash",
        messages=messages,
        tools=tools,
        extra_body={"thinking": {"type": "disabled"}},
    )

    print(final_response.choices[0].message.content)

tool_choice can be none, auto, required, or a named function object that forces a specific function. DeepSeek currently supports function tools and documents a maximum of 128 functions. Function names must use letters, numbers, underscores, or dashes, with a maximum length of 64 characters.

Important: tool_calls[].function.arguments is returned as JSON-format text, but the model may still produce invalid JSON or arguments outside your schema. Validate arguments before executing real code, database writes, transactions, or external API calls.

Strict Tool Mode Beta

Strict mode is a Beta tool-calling feature that makes the model follow your JSON Schema more closely. To use it, set the base URL to https://api.deepseek.com/beta and set strict: true on every function in the tools list.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com/beta",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_delivery_status",
            "strict": True,
            "description": "Get delivery status for an order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The customer order ID"
                    }
                },
                "required": ["order_id"],
                "additionalProperties": False
            }
        }
    }
]

DeepSeek documents a subset of JSON Schema for strict mode, including object, string, number, integer, boolean, array, enum, and anyOf. For object schemas, every property must be listed in required, and additionalProperties must be false.

Chat Prefix Completion Beta

Chat Prefix Completion lets your app provide the beginning of the assistant’s answer and ask the model to continue it. The last message must be an assistant message with prefix: true, and the request must use the Beta base URL.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com/beta",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Write a Python function that checks if a string is a palindrome."},
        {"role": "assistant", "content": "```python\n", "prefix": True},
    ],
    stop=["```"],
)

print(response.choices[0].message.content)

Use this feature for controlled continuation, code-completion-like behavior, and cases where your application needs the output to start from a known assistant prefix. For ordinary chat requests, use the standard base URL instead of the Beta base URL.

Multi-turn Conversations Are Stateless

The DeepSeek /chat/completions API is stateless. The server does not automatically remember previous turns. Your application must append the relevant prior messages and send them again with each request.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

messages = [
    {"role": "user", "content": "What is the highest mountain in the world?"}
]

first = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

messages.append(first.choices[0].message)
messages.append({"role": "user", "content": "What is the second highest?"})

second = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=messages,
    extra_body={"thinking": {"type": "disabled"}},
)

print(second.choices[0].message.content)

Send only the conversation history your task actually needs. For long sessions, use summarization, retrieval, or application-side memory to control prompt size while preserving the information required for the next answer.

Response Object and Usage Fields

A non-streamed response is a chat.completion object. Most apps read choices[0].message.content, but production systems should also inspect finish_reason, tool_calls, reasoning_content, system_fingerprint, and usage where relevant.

FieldMeaningWhy It Matters
choicesCandidate completions.Most applications use choices[0].
message.contentThe final assistant answer.Main output text for non-streaming requests.
message.reasoning_contentReasoning content in thinking mode.Needed for advanced thinking-mode workflows and some tool-call flows.
message.tool_callsFunction call proposals.Your app executes the real tool and returns a tool message.
finish_reasonWhy generation stopped.Useful for retries, truncation handling, safety handling, and tool-call routing.
system_fingerprintBackend configuration fingerprint.Helpful for debugging and reproducibility tracking.
usageToken accounting for the request.Useful for monitoring, capacity planning, cache analysis, and output-limit tuning.

DeepSeek documents these finish_reason values: stop, length, content_filter, tool_calls, and insufficient_system_resource.

Usage Fields to Log

The response usage object can include:

  • prompt_tokens
  • prompt_cache_hit_tokens
  • prompt_cache_miss_tokens
  • completion_tokens
  • completion_tokens_details.reasoning_tokens
  • total_tokens

DeepSeek states that prompt_tokens equals prompt_cache_hit_tokens + prompt_cache_miss_tokens. Track cache-hit and cache-miss tokens separately so you can understand whether stable prompt prefixes are being reused successfully.

Context Caching

DeepSeek Context Caching is enabled by default. When later requests reuse an already persisted prompt prefix, the overlapping prefix can count as a cache hit. This can improve efficiency for repeated system prompts, long shared documents, repeated few-shot examples, and multi-turn conversations.

The practical rule is to keep reusable prompt prefixes stable. Put stable system instructions, shared documents, schemas, and examples before changing user questions, session IDs, timestamps, and request-specific metadata when your application design allows it.

Context caching is best-effort. The official guide says cache construction can take seconds, matching depends on persisted prefix units, and unused cache entries are usually cleared within a few hours to a few days.

Errors, Rate Limits and Keep-alives

DeepSeek dynamically limits concurrency based on server load. When the limit is reached, the API returns HTTP 429. While a request is waiting to be scheduled, non-streaming requests may return empty lines and streaming requests may return SSE keep-alive comments such as : keep-alive. If you parse HTTP manually, handle these keep-alives instead of treating them as malformed output.

Status / SymptomCommon CauseFirst Check
400 Invalid FormatMalformed JSON, invalid message structure, or incompatible thinking/tool state.Rebuild the request from a minimal known-good example.
401 Authentication FailsWrong or missing API key.Check the Bearer token and server-side environment variables.
402 Insufficient BalanceNo usable account balance.Check account status in the official platform.
422 Invalid ParametersUnsupported field, invalid value, or invalid strict schema.Remove optional fields, then add them back one by one.
429 Rate Limit ReachedRequests sent too quickly or current concurrency limit reached.Add backoff, reduce concurrency, and avoid retry storms.
500 Server ErrorServer-side issue.Retry with backoff and inspect whether the request is idempotent.
503 Server OverloadedHigh traffic or overload.Retry with backoff and monitor service status.
JSON response appears stuckJSON Output enabled without clear prompt instructions to output JSON.Add explicit JSON instructions, an example shape, and a reasonable max_tokens.
Tool-call arguments are unsafeArguments may be invalid JSON or include unexpected fields.Parse, validate, authorize, and sanitize before execution.
Thinking + tool flow failsRelevant reasoning_content was not passed back during the tool-call flow.Follow the official thinking-mode tool-call pattern.

Security and Production Best Practices

  • Use deepseek-v4-flash or deepseek-v4-pro directly for new Chat Completions integrations.
  • Use deepseek-v4-flash first for routine, high-volume, latency-sensitive workloads.
  • Route harder reasoning, coding, long-context, and agentic tasks to deepseek-v4-pro.
  • Disable thinking mode for simple chat when you want simpler output.
  • Enable thinking mode for reasoning-heavy tasks and handle reasoning_content correctly.
  • Manage conversation history in your own application because /chat/completions is stateless.
  • Validate JSON Output before using it in backend logic.
  • Validate tool-call arguments before executing functions, database writes, purchases, emails, or external API calls.
  • Store API keys in server-side secrets management, not in client-visible code.
  • Log usage, cache-hit tokens, cache-miss tokens, completion tokens, and reasoning tokens for observability.
  • Keep reusable prompt prefixes stable when you want context caching benefits.
  • Add retry logic with exponential backoff and jitter for transient 429, 500, and 503 responses.
  • Use idempotency protections in your own application when a retry could repeat an external action.
  • Test streaming, JSON Output, tool calls, and strict mode separately before combining them in one workflow.
  • Check the official DeepSeek change log before the legacy-alias retirement deadline on July 24, 2026.

Migration from deepseek-chat and deepseek-reasoner

The legacy aliases are useful for temporary compatibility, but the current V4 model IDs are clearer and safer for long-term integrations.

Legacy NameCurrent Mapping During TransitionRecommended Replacement
deepseek-chatDeepSeek-V4-Flash non-thinking modedeepseek-v4-flash with {"thinking":{"type":"disabled"}}
deepseek-reasonerDeepSeek-V4-Flash thinking modedeepseek-v4-flash or deepseek-v4-pro with {"thinking":{"type":"enabled"}} and reasoning_effort

A practical migration path is to start by replacing routine deepseek-chat calls with deepseek-v4-flash, then route difficult reasoning or coding tasks to deepseek-v4-pro. Test output quality, latency, token usage, context-cache behavior, JSON validity, and tool-call reliability before switching production traffic.

Common DeepSeek Chat Completions Mistakes

MistakeBetter Approach
Calling deepseek-chat the current V3.2 chat model.Describe it as a legacy alias that currently maps to V4-Flash non-thinking mode during the transition period.
Forgetting that /chat/completions is stateless.Send the relevant conversation history with each request.
Enabling JSON Output without prompting for JSON.Use response_format, include the word “json”, and provide an example schema or shape.
Executing tool-call arguments without validation.Parse, validate, authorize, and sanitize arguments before any real action.
Expecting temperature to control thinking-mode behavior.Use thinking and reasoning_effort for thinking-mode behavior.
Putting changing metadata at the beginning of every prompt.Keep stable instructions, schemas, documents, and examples early when context caching matters.
Treating Beta features as ordinary production behavior.Use the Beta base URL only when you intentionally need features such as strict tool mode or Chat Prefix Completion.

DeepSeek Chat Completions FAQ

What is the DeepSeek Chat Completions API?

It is DeepSeek’s main chat-generation endpoint: POST /chat/completions. You send a model and message history, and DeepSeek returns either a normal chat completion object or streamed chunks.

What fields are required in a DeepSeek chat completion request?

The body requires model and messages. The messages array must include at least one message.

Which model should I use: deepseek-v4-flash or deepseek-v4-pro?

Use deepseek-v4-flash for fast, efficient, high-volume workflows. Use deepseek-v4-pro for advanced reasoning, coding, long-context analysis, and complex agentic tasks.

Should I still use deepseek-chat or deepseek-reasoner?

Only for temporary compatibility. They are legacy aliases during the V4 transition period. New integrations should use deepseek-v4-flash or deepseek-v4-pro directly.

How do I enable or disable thinking mode?

Use {"thinking":{"type":"enabled"}} to enable thinking mode, or {"thinking":{"type":"disabled"}} to disable it. In the OpenAI Python SDK, pass the thinking object through extra_body.

What is reasoning_effort?

reasoning_effort controls thinking effort when thinking mode is enabled. DeepSeek documents high and max.

How do I stream DeepSeek chat completions?

Set stream to true. The API sends partial deltas over Server-Sent Events and terminates the stream with data: [DONE].

How do I get JSON output?

Set response_format to {"type":"json_object"}, include the word “json” in the prompt, provide an example of the target JSON shape, and set max_tokens high enough to avoid truncation.

How do tool calls work?

Your application defines function tools. The model may return tool_calls. Your code executes the real function, then returns the result as a tool message with the matching tool_call_id.

Is the DeepSeek Chat Completions API stateless?

Yes. The server does not automatically remember previous conversation turns. Your application must resend the relevant message history with each request.

Official Sources

Related Chat-Deep.ai resources: DeepSeek API guide, DeepSeek Context Caching guide, DeepSeek V4 guide, DeepSeek Models hub, and DeepSeek Status guide.