Last verified: April 1, 2026
POST /chat/completions is the main DeepSeek endpoint for conversational generation. It takes a model plus a messages array, returns either a normal chat completion object or streamed SSE chunks, and supports structured JSON output, tool calls, reasoning output, and stateless multi-turn chat. Beta-only features such as Chat Prefix Completion and strict-mode tool calls require base_url="https://api.deepseek.com/beta". This page is an independent guide for chat-deep.ai, not the official DeepSeek docs. For a broader overview, see our DeepSeek API Guide.
Quick answer: If you can already make one DeepSeek API call,
/chat/completionsis the endpoint you will use most often. Start withmodel="deepseek-chat"and a smallmessagesarray, addstream: truefor incremental output, addresponse_format={"type":"json_object"}for structured JSON, and addtoolsplustool_choicewhen you want the model to propose function calls. If you need Beta prefix completion or strict tool schemas, switch the SDK base URL tohttps://api.deepseek.com/beta.
Quick reference box / minimal request
Minimum required fields:
- Endpoint:
POST /chat/completions - Required body fields:
model,messages - Common base URL for SDKs:
https://api.deepseek.com - OpenAI compatibility note:
https://api.deepseek.com/v1also works as a compatibility base URL, butv1is not a model version - Recommended starter model:
deepseek-chat
Minimal cURL example
curl https://api.deepseek.com/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPSEEK_API_KEY" \
-d '{
"model": "deepseek-chat",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"stream": false
}'

Minimal Python example
from openai import OpenAIclient = OpenAI(
api_key="<DeepSeek API Key>",
base_url="https://api.deepseek.com",
)response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
)print(response.choices[0].message.content)
Optional Node.js example
import OpenAI from "openai";const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com",
});const response = await client.chat.completions.create({
model: "deepseek-chat",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
]
});console.log(response.choices[0].message.content);
These examples follow DeepSeek’s current OpenAI-compatible setup, including base_url="https://api.deepseek.com" and the standard /chat/completions route. DeepSeek also explicitly says you may use /v1 as a compatibility base URL, but that version string does not map to model versioning.
What the DeepSeek Create Chat Completion endpoint does
DeepSeek describes POST /chat/completions as the endpoint that “creates a model response for the given chat conversation.” In practice, that makes it the core API surface for most application work: normal chat, structured JSON generation, function-style tool proposals, and multi-turn conversations where your application keeps the history and resends it on each request. If you are building chatbots, agents, or app features, this is the endpoint you will usually grow around. For simpler implementation examples, see our chatbot guide and web app / SaaS guide.
Endpoint URL, auth, and base URL note
For raw HTTP, the route is:
POST https://api.deepseek.com/chat/completions
For SDKs, use:
base_url="https://api.deepseek.com"
DeepSeek’s quick-start docs also allow https://api.deepseek.com/v1 for compatibility with OpenAI-style tooling, but they warn that the v1 suffix has no relationship to the actual model version. Authentication is done with a DeepSeek API key sent as a Bearer token.

DeepSeek also offers an Anthropic-compatible API endpoint; see our DeepSeek Anthropic API in Claude Code guide for setup details.
Required request fields
At the schema level, only two body fields are truly required: messages and model. messages must contain at least one message, and model must currently be either deepseek-chat or deepseek-reasoner. Everything else is additive.
| Category | Fields | Notes |
|---|---|---|
| Required | model, messages | Minimum request body |
| Common optional | thinking, max_tokens, temperature, top_p, stop, stream, stream_options.include_usage, response_format, tools, tool_choice | Practical controls most apps use |
| Advanced optional | logprobs, top_logprobs | Useful for scoring/debugging, but not for all model modes |
| Beta-only | assistant prefix, assistant reasoning_content input for prefix completion, strict tool schemas | Requires base_url="https://api.deepseek.com/beta" |
This breakdown is based on the current request schema, JSON Output guide, Function Calling guide, Thinking Mode guide, and Chat Prefix Completion guide.
Message roles explained
The messages array is not just plain text history. DeepSeek defines four role types: system, user, assistant, and tool. system and user carry instructions and prompts. assistant can carry normal assistant content, generated tool calls, or a Beta prefix stub. tool is how your application returns tool results back to the model, and it must include tool_call_id.
| Role | Required fields | When to use it | Important note |
|---|---|---|---|
system | role, content | Global behavior / constraints | Optional name supported |
user | role, content | User prompt or app input | Optional name supported |
assistant | role, content (nullable) | Prior assistant replies, tool proposals, Beta prefix completion | Can also contain tool_calls; Beta prefix is only for the last assistant message |
tool | role, content, tool_call_id | Return your function result to the model | Must reference the specific tool call ID |
The schema also allows name on system, user, and assistant, and Beta prefix: true on the last assistant message for Chat Prefix Completion.
Model selection: deepseek-chat vs deepseek-reasoner

As of April 1, 2026, DeepSeek’s current Models & Pricing page says deepseek-chat and deepseek-reasoner both map to DeepSeek-V3.2 with a 128K context window. deepseek-chat is the non-thinking mode; deepseek-reasoner is the thinking mode. DeepSeek lists the default / maximum output lengths differently as well: deepseek-chat defaults to 4K and can go up to 8K, while deepseek-reasoner defaults to 32K and can go up to 64K.
There are two practical ways to enable reasoning behavior today. You can set model="deepseek-reasoner", or you can keep model="deepseek-chat" and enable thinking through the thinking parameter. In the OpenAI SDK, DeepSeek says thinking should be passed inside extra_body. If you want background and capability context around the current model generation, your related internal reads are DeepSeek-V3.2 and DeepSeek R1.
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Solve 31 * 47"}],
extra_body={"thinking": {"type": "enabled"}}
)
One caveat matters here: the newer Thinking Mode and Models & Pricing docs show Tool Calls support for both current V3.2 API models, but the older deepseek-reasoner guide still lists Function Calling as unsupported. The safest current interpretation is to treat the newer Thinking Mode and V3.2 docs as the source of truth, while testing deepseek-reasoner tool flows carefully in your own client.
Another caveat: in thinking mode, DeepSeek says temperature, top_p, presence_penalty, and frequency_penalty do not have effect, while logprobs and top_logprobs will trigger an error. That means deepseek-reasoner is not just “chat but smarter”; it changes which controls are actually meaningful.
Optional fields that matter in practice
max_tokens controls the maximum generated tokens, but DeepSeek notes that the total input plus output is still limited by the model’s context window. temperature and top_p are both supported in the general schema, but DeepSeek explicitly recommends changing one or the other, not both. stop can be a single string or an array, with up to 16 stop sequences. logprobs=true returns token log probabilities, and top_logprobs can request up to 20 likely alternatives per token position.
For most production apps, the practical default is simple: set max_tokens intentionally, leave temperature alone unless you have a reason to tune it, use stop only when you need deterministic boundaries, and reserve logprobs for scoring or debugging rather than standard UX flows. In other words, start from predictable request bodies first, then add optional fields one by one.
Streaming responses and SSE parsing
When stream=true, DeepSeek sends partial deltas as data-only Server-Sent Events and terminates the stream with data: [DONE]. In streamed responses, the payload changes shape: you parse choices[0].delta instead of a final choices[0].message. For deepseek-reasoner, streamed chunks can contain delta.reasoning_content before the final answer content arrives.
from openai import OpenAIclient = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Give me three deployment tips."}
],
stream=True,
)for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="")

If you also set stream_options={"include_usage": true}, DeepSeek adds one extra chunk before [DONE]. That final usage chunk always has an empty choices array and the full request-level usage fields. All other streamed chunks may contain usage: null.
There is an important parser detail here. DeepSeek’s Rate Limit and FAQ pages say that while requests are waiting to be scheduled, non-stream responses may emit empty lines and stream responses may emit SSE keep-alive comments such as : keep-alive. The OpenAI SDK handles this for you, but if you parse the HTTP stream yourself, you must ignore those lines/comments. DeepSeek also says the server closes the connection if inference has not started after 10 minutes.
JSON Output
DeepSeek’s JSON Output feature is not just “ask nicely for JSON.” The official requirement has two parts: set response_format={"type": "json_object"} and tell the model in the prompt to produce JSON. DeepSeek explicitly warns that if you do not instruct the model to output JSON, it may emit an unending stream of whitespace until it hits the token limit, making the request look stuck. DeepSeek also warns that JSON can be truncated if max_tokens is too low, and that the API may occasionally return empty content in JSON mode.
import json
from openai import OpenAIclient = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{
"role": "system",
"content": (
'Return valid JSON only. '
'Use this schema: {"question": string, "answer": string}.'
),
},
{
"role": "user",
"content": 'Convert this into JSON: "What is the capital of Egypt? Cairo."',
},
],
response_format={"type": "json_object"},
max_tokens=256,
)data = json.loads(response.choices[0].message.content)
print(data)
The safest mental model is: response_format enforces the output container, but your prompt still has to define the expected JSON shape clearly. If you care about structured automation, this is one of the most useful places to link readers to your pricing hub and cost calculator, because malformed or repeated retries can quietly waste tokens.
Tool Calls and tool_choice
Tool calls in DeepSeek chat completions are proposal-based, not execution-based. You send tools; the model may respond with tool_calls; your application executes the real function; then you append a tool message containing the result plus the matching tool_call_id. DeepSeek’s own Function Calling guide shows exactly this pattern.
import json
from openai import OpenAIclient = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]messages = [{"role": "user", "content": "What is the weather in Cairo?"}]response = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools,
tool_choice="auto"
)message = response.choices[0].message
messages.append(message)tool = message.tool_calls[0]
args = json.loads(tool.function.arguments)# Your code executes the real function here
tool_result = f"It is sunny in {args['location']}."messages.append({
"role": "tool",
"tool_call_id": tool.id,
"content": tool_result
})follow_up = client.chat.completions.create(
model="deepseek-chat",
messages=messages,
tools=tools
)print(follow_up.choices[0].message.content)
DeepSeek’s current schema defines tool_choice clearly:none = never call a tool;auto = model may choose message vs tool;required = model must call one or more tools;
or you can force a named function with a structured object. The docs also say only function tools are supported, with a maximum of 128 functions, and function names must use letters, numbers, underscores, or dashes with a maximum length of 64 characters.
One more production warning: DeepSeek’s schema says tool_calls[].function.arguments is returned as JSON-format text, but the model may still produce invalid JSON or hallucinate parameters outside your schema. Always validate parsed arguments before calling your real function.
strict mode (Beta)
Strict mode is the Beta version of tool calling where the model is pushed to comply with your JSON Schema more exactly. To use it, DeepSeek says you must switch to base_url="https://api.deepseek.com/beta" and set strict: true on all functions in the tools list. The server also validates the schema and returns an error if it does not conform to DeepSeek’s supported subset.
from openai import OpenAIclient = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com/beta")tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"strict": True,
"description": "Get weather for a city.",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"],
"additionalProperties": False
}
}
}
]
DeepSeek’s current strict-mode guide is narrower than “full JSON Schema.” Supported types include object, string, number, integer, boolean, array, enum, and anyOf. It also says every property inside an object must be listed in required, additionalProperties must be false, and some schema features such as minLength and maxLength are not supported. That is why strict mode is great for stable function signatures, but it is not a free-form schema validator.
Chat Prefix Completion (Beta)
Chat Prefix Completion is a Beta variant of chat completion where the last message in messages is an assistant message that contains a prefix the model must continue from. DeepSeek requires two things: the last message must have role="assistant" and it must include prefix=true. This Beta feature also requires base_url="https://api.deepseek.com/beta".
from openai import OpenAIclient = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com/beta")response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Write a Python function that returns Fibonacci numbers."},
{"role": "assistant", "content": "```python\n", "prefix": True},
],
stop=["```"]
)print(response.choices[0].message.content)
For deepseek-reasoner, the Beta schema also allows reasoning_content on that last assistant message when using prefix completion. DeepSeek documents this specifically as an input for the CoT in the prefix-completion flow. This is a niche feature, but it matters if you are experimenting with controlled continuation in reasoning mode.
Multi-round conversation and stateless context
DeepSeek is explicit that /chat/completions is stateless. The server does not store your conversation context for you. If you want turn 2 to “remember” turn 1, your application must resend the earlier messages along with the new user turn. That is the single most important thing developers get wrong after their first successful API call.
from openai import OpenAIclient = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")messages = [{"role": "user", "content": "What is the highest mountain in the world?"}]
response = client.chat.completions.create(model="deepseek-chat", messages=messages)messages.append(response.choices[0].message)
messages.append({"role": "user", "content": "And what about the second highest?"})response = client.chat.completions.create(model="deepseek-chat", messages=messages)
print(response.choices[0].message.content)
With deepseek-reasoner, DeepSeek adds an extra rule: across normal turns, you keep the previous content but not the previous reasoning_content. During tool-calling sub-turns inside a single reasoning turn, however, the newer Thinking Mode guide says you may need to pass reasoning_content back so the model can continue its reasoning. DeepSeek explicitly warns that if your thinking+tool-calls code path does not return reasoning_content correctly, the API can return a 400 error.
Response object anatomy
The normal, non-streamed response is a chat.completion object. The fields developers typically care about most are choices, choices[0].message, finish_reason, system_fingerprint, and usage. If you are using deepseek-reasoner, the returned assistant message may also include reasoning_content. If you are using tools, the assistant message may instead or additionally include tool_calls.
| Field | What it means | Why it matters |
|---|---|---|
choices | Candidate completions | Most apps read choices[0] |
message.content | Final assistant answer | Main text output |
message.reasoning_content | Reasoning output for deepseek-reasoner | Useful for inspection or advanced workflows |
message.tool_calls | Proposed function call(s) | Your app must execute the real tool |
finish_reason | Why generation stopped | Useful for retry/debug logic |
system_fingerprint | Backend configuration fingerprint | Helpful for reproducibility/debugging |
usage | Token accounting | Needed for cost tracking and cache analysis |
DeepSeek currently documents these finish_reason values: stop, length, content_filter, tool_calls, and insufficient_system_resource. That is useful operationally: length usually means your output hit max_tokens; tool_calls means the model wants you to execute a function; insufficient_system_resource means the inference system interrupted the request.
Usage, caching, and token budgeting
DeepSeek’s response schema includes completion_tokens, prompt_tokens, prompt_cache_hit_tokens, prompt_cache_miss_tokens, total_tokens, and completion_tokens_details.reasoning_tokens. The docs also say prompt_tokens = prompt_cache_hit_tokens + prompt_cache_miss_tokens. That makes /chat/completions responses directly useful for measuring both spend and cache effectiveness. If you want pricing context alongside those fields, see our DeepSeek pricing page and API cost calculator.

DeepSeek’s Context Caching guide adds the missing operational detail: context caching is enabled by default, only repeated prefix content can hit the cache, the system is best-effort rather than guaranteed, and content below 64 tokens will not be cached. That means the cache fields in usage are meaningful, but you should not build app logic that assumes a 100% hit rate.
As of April 1, 2026, DeepSeek’s public pricing page lists the same token prices for both deepseek-chat and deepseek-reasoner: $0.028 per 1M cache-hit input tokens, $0.28 per 1M cache-miss input tokens, and $0.42 per 1M output tokens. That makes the usage block more than a debug object; it is directly tied to billing.
Common errors and fixes
Most /chat/completions problems collapse into a small set of patterns: malformed message arrays, wrong API key, bad parameter combinations, truncated JSON, broken manual stream parsing, or incorrect state handling for reasoning mode and multi-turn chat. DeepSeek’s official Error Codes page gives the HTTP-level categories, while the JSON Output, Thinking Mode, and Rate Limit docs explain the request-shape and transport details behind them. For a site-level companion page, see our DeepSeek Error Codes.
| Code / symptom | Common /chat/completions cause | What to check first |
|---|---|---|
| 400 Invalid Format | Broken body, invalid messages, missing required structure, wrong reasoning/tool flow | Rebuild the request from a minimal known-good example |
| 401 Authentication Fails | Wrong or missing API key | Verify Bearer token and the correct platform key |
| 402 Insufficient Balance | No remaining balance | Check billing/top-up |
| 422 Invalid Parameters | Bad field values, unsupported combinations, invalid strict schema | Remove extra fields, validate Beta requirements |
| 429 Rate Limit Reached | Requests sent too quickly or dynamic rate pressure | Back off and retry |
| 500 / 503 | Server issue or overload | Retry after a short wait |
| JSON looks stuck / truncated | Missing JSON instruction, too-low max_tokens, or empty-content JSON mode edge case | Add JSON instructions and raise max_tokens |
| Streaming parser looks broken | You are not skipping empty lines / : keep-alive comments | Fix the parser before changing the prompt |
| Reasoner multi-turn fails | You resent the wrong fields or mishandled reasoning_content | Follow DeepSeek’s reasoning-mode examples closely |
This troubleshooting map comes directly from the current Error Codes, JSON Output, Thinking Mode, Rate Limit, and FAQ pages. DeepSeek also says 429 limits are dynamic and cannot currently be increased per account.
Best practices checklist
Before you call this endpoint “done,” make sure your client does these things:
- Pin the model explicitly.
- Treat
/chat/completionsas stateless and own the history in your app. - Use
stream=truefor interactive UX, but parse keep-alives safely if you do manual HTTP handling. - Use JSON Output only when your prompt also tells the model to produce JSON.
- Validate tool arguments before executing real code.
- Use
deepseek-chatas the safer default for standard integrations. - Use
deepseek-reasonerwhen you actually need reasoning output, and handle its parameter differences carefully. - Monitor
usage,prompt_cache_hit_tokens, andreasoning_tokensso cost does not become invisible.
FAQ
What is the DeepSeek Create Chat Completion endpoint?
It is DeepSeek’s core chat-generation endpoint: POST /chat/completions. You send a model plus a message history; DeepSeek returns a normal completion object or a stream of chat-completion chunks. It is the base endpoint behind standard chat, JSON Output, tool calls, and stateless multi-turn conversations.
What fields are required in a DeepSeek chat completion request?
Only model and messages are required in the body. The messages array must contain at least one item, and the model must be one of the current supported chat models, such as deepseek-chat or deepseek-reasoner.
What is the difference between deepseek-chat and deepseek-reasoner here?
deepseek-chat is the non-thinking V3.2 API model, while deepseek-reasoner is the thinking V3.2 API model. They share a 128K context window, but they differ in output limits, parameter behavior, and reasoning output. Thinking mode also changes which controls matter; for example, some sampling controls are ignored and logprobs can error.
How do I get valid JSON output from DeepSeek?
Set response_format={"type":"json_object"} and tell the model in your prompt to produce JSON. DeepSeek explicitly says to include the word “json” in the system or user prompt and to set max_tokens reasonably so the JSON does not get cut off.
How do tool calls work in DeepSeek chat completions?
You declare tools in the request; the model may return tool_calls; your application executes the function; then you append a tool role message with the matching tool_call_id and the real tool result. DeepSeek also warns that the generated JSON arguments can be invalid or include hallucinated parameters, so you should validate before execution.
Why does my streaming response look stuck or incomplete?
There are three common causes. First, you may be parsing the stream incorrectly and not ignoring empty lines or : keep-alive comments. Second, if you are using JSON Output without instructing the model to produce JSON, DeepSeek says you may get a long whitespace stream. Third, if the request has not started inference after 10 minutes, DeepSeek says the server may close the connection.
Why am I getting 400 or 422 errors?
400 usually means the body format is invalid. 422 usually means the parameters are invalid. In this endpoint, that often means malformed messages, a broken Beta strict schema, unsupported parameter combinations, or incorrect reasoning/tool state handling. Re-test from a minimal valid example before adding optional fields back.
Is DeepSeek chat completions stateless?
Yes. DeepSeek explicitly says /chat/completions is stateless, so your application must resend the relevant prior messages on every turn. For normal reasoning multi-turn chat, you resend the previous final content, not the earlier reasoning_content, unless you are inside a reasoning+tool-calls flow that requires it.
Conclusion
If you understand one thing about DeepSeek chat completions, make it this: the endpoint is simple at the minimum and deep at the edges. The minimum is just model + messages. The deeper layer is where production work happens: streaming SSE, JSON Output, tool calls, Beta strict schemas, prefix completion, usage accounting, cache-aware billing, and careful state handling for reasoning mode. That is why /chat/completions is not just a quickstart endpoint; it is the reference surface most DeepSeek apps eventually grow around.
If you want a cleaner implementation path after this reference page, the best internal next reads are our DeepSeek API Guide, chatbot implementation guide, and web app / SaaS integration guide. Just keep the official DeepSeek docs as the factual source of truth whenever an older example or internal page differs.
Want to see DeepSeek in action before writing code? Try DeepSeek Chat free — no API key needed.





