DeepSeek Thinking Mode - Chat-Deep.ai

Quick answer: DeepSeek Thinking Mode is an API mode where the model can return reasoning output before the final answer. As of April 28, 2026, in the DeepSeek V4 Preview API, use deepseek-v4-flash or deepseek-v4-pro, and control thinking explicitly with extra_body={“thinking”: {“type”: “enabled”}} or extra_body={“thinking”: {“type”: “disabled”}} when using the OpenAI Python SDK.

The final user-facing answer is returned in content. Reasoning output is returned separately in reasoning_content, at the same level as content. For most production apps, reasoning_content should be handled carefully, stored separately if needed, and not displayed to end users by default unless your product has a clear policy for exposing reasoning traces.

Independent disclosure: Chat-Deep.ai is an independent DeepSeek-focused guide and browser access site. Chat-Deep.ai is not affiliated with DeepSeek, DeepSeek.com, Hangzhou DeepSeek Artificial Intelligence Co., Ltd., the official DeepSeek app, the official DeepSeek API platform, OpenAI, or the OpenAI Python SDK.

This guide is written for developers who want to understand DeepSeek Thinking Mode in practical API workflows. Always verify production behavior against the official DeepSeek documentation before deploying reasoning, tool-call, or structured-output workflows.

DeepSeek API snapshot — last verified April 28, 2026

Current API model IDs: deepseek-v4-flash and deepseek-v4-pro
Base URL: https://api.deepseek.com
API format: OpenAI-compatible Chat Completions
Current API generation: DeepSeek V4 Preview
Context length: 1M tokens
Max output: 384K tokens
Thinking mode: supported
Non-thinking mode: supported
JSON Output: supported
Tool Calls: supported
FIM Completion: non-thinking mode only
Thinking default: enabled
reasoning_effort: high or max
Legacy aliases: deepseek-chat and deepseek-reasoner are scheduled to be fully retired and inaccessible after July 24, 2026, 15:59 UTC.

Who this guide is for

This guide is for developers building DeepSeek API workflows that need stronger reasoning, complex coding support, long-context analysis, multi-step planning, math-like reasoning, or tool planning.

It is also for teams that need to decide when to enable Thinking Mode, when to disable it, how to keep reasoning_content separate from normal UI output, and how to avoid mistakes in streaming, JSON Output, and Tool Calls.

If you only need basic Python setup, read the DeepSeek Python SDK guide. If you need a wider overview of API keys, base URLs, and model IDs, read the DeepSeek API guide.

What is DeepSeek Thinking Mode?

DeepSeek Thinking Mode is a reasoning mode where the model can generate intermediate reasoning output before producing the final answer. In the API response, that reasoning output is exposed through the reasoning_content field, while the final answer appears in content.

Thinking Mode is useful when a task benefits from deliberate reasoning: hard coding tasks, complex debugging, long-context document analysis, planning across multiple steps, tool-use decisions, and questions where a short direct answer is likely to miss important constraints.

Thinking Mode should not be treated as a universal default for every route. For simple extraction, classification, formatting, short summaries, and latency-sensitive chat, non-thinking mode is often simpler and easier to operate.

Thinking Mode vs non-thinking mode

The practical difference is how much reasoning behavior you want the model to use before it returns the final answer.

Thinking Mode: best for complex reasoning, hard coding tasks, long-context analysis, multi-step planning, math-like reasoning, and tool planning.
Non-thinking mode: best for simple chat, extraction, classification, formatting, short summaries, rewriting, routing, and latency-sensitive endpoints.

For production systems, set thinking explicitly instead of relying only on defaults. That makes each route easier to reason about, test, monitor, and update.

Which DeepSeek models support Thinking Mode?

For new DeepSeek API integrations, use the current V4 model IDs:

deepseek-v4-flash for fast everyday workloads, lightweight reasoning, summaries, extraction, classification, and high-volume applications.
deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, multi-step planning, and higher-value production tasks.

Both current V4 models support thinking and non-thinking modes, JSON Output, and Tool Calls.

Do not use deepseek-chat or deepseek-reasoner as primary model IDs in new code. They are legacy compatibility aliases scheduled for discontinuation on 2026-07-24. During the compatibility period, deepseek-chat corresponds to DeepSeek V4 Flash non-thinking mode, while deepseek-reasoner corresponds to DeepSeek V4 Flash thinking mode.

How to enable or disable Thinking Mode

In the OpenAI-compatible DeepSeek API, Thinking Mode is controlled with a thinking object. When using the OpenAI Python SDK, pass that object through extra_body.

Install the Python package

pip install openai

Set your DeepSeek API key

Use environment variables. Do not hard-code API keys in source code.

macOS or Linux

export DEEPSEEK_API_KEY="your_api_key_here"

Windows PowerShell

[Environment]::SetEnvironmentVariable("DEEPSEEK_API_KEY", "your_api_key_here", "User")

Basic client setup

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

Enable thinking

extra_body={"thinking": {"type": "enabled"}}

Disable thinking

extra_body={"thinking": {"type": "disabled"}}

Thinking is enabled by default, but production routes should usually set it explicitly so that behavior stays predictable.

reasoning_effort explained

reasoning_effort controls how much reasoning effort the model should apply in Thinking Mode. The supported values are high and max.

Use high for normal reasoning tasks, coding help, analysis, and tool planning.
Use max for especially complex tasks where additional reasoning may improve the result.

In Thinking Mode, the default effort is high for regular requests. For some complex agent requests, effort may automatically be set to max. For compatibility, low and medium may map to high, while xhigh may map to max.

reasoning_content vs content

In Thinking Mode, the model can return two different kinds of output:

reasoning_content: reasoning output returned separately by the API.
content: the final answer that should normally be shown to the end user.

For normal user-facing applications, display content, not reasoning_content. Treat reasoning_content as an API field for continuity, debugging, evaluation, or internal handling only when your product has a clear policy for it.

Minimal Python example: Thinking Mode

This example enables Thinking Mode explicitly with deepseek-v4-pro. It reads the final answer from content and keeps reasoning_content separate.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {
            "role": "user",
            "content": "Compare two API retry strategies and explain the tradeoffs.",
        }
    ],
    reasoning_effort="high",
    max_tokens=2000,
    extra_body={"thinking": {"type": "enabled"}},
)

message = response.choices[0].message

reasoning = getattr(message, "reasoning_content", None)
if reasoning:
    # Keep reasoning separate from normal end-user output.
    # Store, inspect, or discard it according to your product policy.
    pass

print(message.content)

Use this pattern for tasks where the quality benefit of reasoning matters more than keeping the route as short and simple as possible.

Minimal Python example: non-thinking mode

Non-thinking mode is often better for simple chat, extraction, short summaries, classification, rewriting, and latency-sensitive routes.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a concise classification assistant."},
        {"role": "user", "content": "Classify this ticket as billing, technical, or account: I cannot reset my password."},
    ],
    max_tokens=300,
    extra_body={"thinking": {"type": "disabled"}},
)

print(response.choices[0].message.content)

Use non-thinking mode when the task is direct and does not need deeper multi-step reasoning.

Streaming Thinking Mode responses

In streaming Thinking Mode responses, reasoning and final answer text can arrive separately. A good streaming parser should accumulate delta.reasoning_content and delta.content in separate buffers.

For normal user interfaces, display only the final answer content by default. Do not stream reasoning text directly to users unless your product has a clear policy for doing so.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {
            "role": "user",
            "content": "Explain the tradeoffs between caching and retries in an API client.",
        }
    ],
    reasoning_effort="high",
    stream=True,
    stream_options={"include_usage": True},
    extra_body={"thinking": {"type": "enabled"}},
)

reasoning_buffer = []
answer_buffer = []
usage = None

for chunk in stream:
    if getattr(chunk, "usage", None):
        usage = chunk.usage

    if not chunk.choices:
        continue

    delta = chunk.choices[0].delta

    reasoning_piece = getattr(delta, "reasoning_content", None)
    if reasoning_piece:
        reasoning_buffer.append(reasoning_piece)
        continue

    content_piece = getattr(delta, "content", None)
    if content_piece:
        answer_buffer.append(content_piece)
        print(content_piece, end="", flush=True)

final_answer = "".join(answer_buffer)
reasoning_text = "".join(reasoning_buffer)

# Do not show reasoning_text to end users by default.
# Use it only according to your product policy.

if usage:
    print("\n\nUsage object received.")

The final usage-bearing chunk may have an empty choices array, so streaming code should handle that case safely.

Multi-turn conversations without tool calls

For ordinary multi-turn conversations without Tool Calls, old reasoning_content does not need to participate in the next turn’s context. Your application can keep the final assistant content and send the next user message.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

messages = [
    {"role": "user", "content": "Explain why API retries need backoff."}
]

first_response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=messages,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

first_message = first_response.choices[0].message
print(first_message.content)

messages.append(
    {
        "role": "assistant",
        "content": first_message.content or "",
    }
)

messages.append(
    {
        "role": "user",
        "content": "Now give me a short production checklist.",
    }
)

second_response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=messages,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

print(second_response.choices[0].message.content)

This is different from Thinking Mode Tool Calls. When a thinking-mode assistant turn includes tool calls, preserve the full assistant message internally.

Tool Calls in Thinking Mode

DeepSeek Tool Calls can be used in Thinking Mode. This is useful when the model needs to reason, request external information, continue reasoning, and then produce a final answer.

The important rule is stricter than normal multi-turn chat: if a thinking-mode turn includes tool calls, preserve and pass back the full assistant message, including reasoning_content, content, and tool_calls where present. Stripping required reasoning_content in a thinking-mode tool-call loop can cause a 400-level error.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

def lookup_order_status(order_id: str) -> dict:
    if not isinstance(order_id, str) or not order_id.startswith("ORD-"):
        raise ValueError("Invalid order_id.")

    return {
        "order_id": order_id,
        "status": "processing",
        "estimated_ship_date": "tomorrow",
    }

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order_status",
            "description": "Look up the current status of a customer order by order ID.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The order ID, for example ORD-12345.",
                    }
                },
                "required": ["order_id"],
            },
        },
    }
]

messages = [
    {
        "role": "user",
        "content": "Check order ORD-12345 and explain what the customer should expect next.",
    }
]

for _ in range(4):
    response = client.chat.completions.create(
        model="deepseek-v4-pro",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        reasoning_effort="high",
        extra_body={"thinking": {"type": "enabled"}},
    )

    assistant_message = response.choices[0].message

    # Preserve the full assistant message internally.
    # In thinking-mode tool-call loops, this may include reasoning_content,
    # content, and tool_calls.
    messages.append(assistant_message.model_dump(exclude_none=True))

    if not assistant_message.tool_calls:
        print(assistant_message.content)
        break

    for call in assistant_message.tool_calls:
        if call.function.name != "lookup_order_status":
            raise ValueError(f"Unsupported tool requested: {call.function.name}")

        try:
            arguments = json.loads(call.function.arguments)
        except json.JSONDecodeError as exc:
            raise ValueError("Tool arguments were not valid JSON.") from exc

        order_id = arguments.get("order_id")
        result = lookup_order_status(order_id)

        messages.append(
            {
                "role": "tool",
                "tool_call_id": call.id,
                "content": json.dumps(result),
            }
        )
else:
    raise RuntimeError("Tool loop reached the maximum number of rounds.")

The model requests a tool call, but it does not execute the function. Your application validates the arguments, runs the function, appends the tool result with the correct tool_call_id, and sends the next request.

JSON Output in Thinking Mode

DeepSeek JSON Output uses response_format={"type": "json_object"}. The prompt should explicitly mention json, provide an example JSON shape, and set max_tokens reasonably to reduce truncation risk.

For simple extraction, non-thinking JSON Output may be simpler. Use Thinking Mode with JSON Output only when the structured result depends on harder reasoning.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

system_prompt = """
You analyze technical support tickets and return json.

Return only a valid json object with this shape:
{
  "category": "billing | technical | account | other",
  "priority": "low | medium | high",
  "summary": "short summary"
}
"""

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": system_prompt},
        {
            "role": "user",
            "content": "The dashboard export fails every time I try to download a weekly report.",
        },
    ],
    response_format={"type": "json_object"},
    reasoning_effort="high",
    max_tokens=800,
    extra_body={"thinking": {"type": "enabled"}},
)

choice = response.choices[0]
content = choice.message.content or ""

if choice.finish_reason == "length":
    raise RuntimeError("The JSON response may have been truncated. Increase max_tokens or shorten the prompt.")

if not content.strip():
    raise RuntimeError("The model returned empty content. Make the json instruction more explicit.")

data = json.loads(content)

required_keys = {"category", "priority", "summary"}
missing = required_keys - set(data)

if missing:
    raise ValueError(f"Missing required keys: {missing}")

print(data)

JSON Output improves parseability, but your application should still validate required keys, allowed values, and field types before trusting the result.

Parameters that do not affect Thinking Mode

In Thinking Mode, these parameters do not affect output even if they are passed for compatibility:

temperature
top_p
presence_penalty
frequency_penalty

Do not tune Thinking Mode behavior by changing these parameters. Use the thinking toggle, reasoning_effort, prompt design, model selection, and route-level product logic instead.

FIM Completion and Thinking Mode

FIM Completion is documented as non-thinking mode only. If you are building code completion or fill-in-the-middle workflows, treat that route separately from Thinking Mode routes.

For coding assistants, this means you may use Thinking Mode for harder code reasoning, debugging, and architecture questions, while using non-thinking mode for FIM-style completion workflows where supported.

Token usage, context caching, and cost control without prices

Thinking Mode can increase output length and token usage because the model may produce reasoning output as well as the final answer. Tool-call loops can also add extra assistant and tool messages to the conversation.

usage = response.usage

if usage:
    print("Prompt tokens:", getattr(usage, "prompt_tokens", None))
    print("Completion tokens:", getattr(usage, "completion_tokens", None))
    print("Total tokens:", getattr(usage, "total_tokens", None))
    print("Prompt cache hit tokens:", getattr(usage, "prompt_cache_hit_tokens", None))
    print("Prompt cache miss tokens:", getattr(usage, "prompt_cache_miss_tokens", None))

    completion_details = getattr(usage, "completion_tokens_details", None)
    if completion_details:
        print("Reasoning tokens:", getattr(completion_details, "reasoning_tokens", None))

Context Caching is enabled by default and does not require a code change. It can help repeated-prefix workloads such as stable system prompts, repeated tool definitions, long shared instructions, and repeated document context.

Because DeepSeek API pricing can change, this guide does not copy token prices. Check the official DeepSeek pricing page and Chat-Deep.ai’s pricing guide before making billing decisions.

Cost-control habits without copying prices

Disable thinking for simple classification, formatting, and extraction routes.
Use deepseek-v4-flash where speed and volume matter more than deeper reasoning.
Use deepseek-v4-pro for harder reasoning routes where quality matters more.
Set route-specific max_tokens values.
Keep tool results compact.
Trim or summarize old conversation turns.
Log token usage by route, model, thinking setting, and feature flag.

Error handling and debugging

Thinking Mode workflows can fail because of invalid request formatting, missing API keys, invalid parameters, rate limits, server errors, malformed JSON Output instructions, or stripped reasoning_content in tool-call loops.

Retry temporary issues carefully. Do not blindly retry invalid requests, authentication failures, account-balance problems, or invalid parameters without fixing the underlying issue.

import os
import time
from openai import APIStatusError, APITimeoutError, OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
    timeout=60,
    max_retries=0,
)

RETRY_STATUS_CODES = {429, 500, 503}

def create_completion_with_retry(payload: dict, max_attempts: int = 3):
    delay_seconds = 2

    for attempt in range(1, max_attempts + 1):
        try:
            return client.chat.completions.create(**payload)

        except APITimeoutError:
            if attempt == max_attempts:
                raise

        except APIStatusError as exc:
            if exc.status_code not in RETRY_STATUS_CODES:
                raise

            if attempt == max_attempts:
                raise

        time.sleep(delay_seconds)
        delay_seconds *= 2

    raise RuntimeError("Request failed after retries.")

payload = {
    "model": "deepseek-v4-pro",
    "messages": [
        {"role": "user", "content": "Explain how retry backoff works."}
    ],
    "reasoning_effort": "high",
    "extra_body": {"thinking": {"type": "enabled"}},
}

response = create_completion_with_retry(payload)
print(response.choices[0].message.content)

Thinking Mode debugging checklist

Missing thinking toggle: set extra_body={"thinking": {"type": "enabled"}} or extra_body={"thinking": {"type": "disabled"}} explicitly.
Wrong model names: use deepseek-v4-flash or deepseek-v4-pro for new code.
Temperature tuning has no effect: temperature and top_p do not affect Thinking Mode output.
Streaming parser misses reasoning: handle delta.reasoning_content separately from delta.content.
UI displays reasoning unexpectedly: show content by default and keep reasoning_content separate.
Tool-call loop fails: preserve the full assistant message, including reasoning_content, in thinking-mode tool-call loops.
FIM route fails: FIM Completion is non-thinking mode only.
JSON Output fails: include the word json, provide an example JSON shape, and set max_tokens reasonably.

Security, privacy, and UI guidance for reasoning_content

reasoning_content is not the same as the final answer. It can be useful for API continuity, controlled debugging, evaluation, and advanced internal workflows, but it should be handled carefully.

Display content as the normal user-facing answer.
Do not show reasoning_content to end users by default.
Do not log sensitive user data unnecessarily.
Apply your normal data-retention policy to reasoning traces if you store them.
Keep reasoning traces separate from public UI output.
Preserve reasoning_content internally when required for thinking-mode tool-call loops.
Do not treat reasoning text as a source of truth; validate important claims and tool arguments separately.

For most applications, the safest default is simple: use reasoning_content only where the API workflow requires it, and show users the final content.

Common mistakes

Using legacy aliases as primary model IDs: use deepseek-v4-flash or deepseek-v4-pro in new integrations.
Relying on defaults in production: set thinking explicitly per route.
Displaying reasoning by default: show content, not reasoning_content.
Expecting temperature to tune thinking output: temperature and top_p do not affect Thinking Mode output.
Ignoring streaming reasoning chunks: parse delta.reasoning_content and delta.content separately.
Stripping reasoning in tool-call loops: preserve the full assistant message internally when Tool Calls are involved.
Using Thinking Mode for every task: disable it for simple extraction, classification, and formatting routes where deeper reasoning is unnecessary.
Using FIM while thinking is enabled: FIM Completion is non-thinking mode only.
Copying prices into evergreen documentation: link to the pricing pages instead of hard-coding values.

When this guide is not the right page

This page focuses on DeepSeek Thinking Mode. Use a more specific page if your goal is different:

For basic Python setup, read the DeepSeek Python SDK guide.
For API keys, base URLs, and model overview, read the DeepSeek API guide.
For current V4 model details, read the DeepSeek V4 guide.
For function calling, read the DeepSeek Tool Calls guide.
For structured responses, read the DeepSeek JSON Output guide.
For cache behavior, read the DeepSeek Context Caching guide.
For token accounting, read the DeepSeek Token Usage guide.
For troubleshooting, read the DeepSeek Error Codes guide.
For migration from OpenAI-style code, read the OpenAI SDK to DeepSeek guide.
For JavaScript and TypeScript, read the DeepSeek Node.js TypeScript guide.

FAQ

What is DeepSeek Thinking Mode?

DeepSeek Thinking Mode is a reasoning mode where the model can produce reasoning output before the final answer. The reasoning output is returned in reasoning_content, while the final answer is returned in content.

Is DeepSeek Thinking Mode enabled by default?

Yes. Thinking Mode defaults to enabled, but production applications should set it explicitly with the thinking parameter so each route behaves predictably.

How do I disable Thinking Mode?

With the OpenAI Python SDK, pass extra_body={"thinking": {"type": "disabled"}} in the Chat Completions request.

Which DeepSeek models support Thinking Mode?

The current DeepSeek V4 API models deepseek-v4-flash and deepseek-v4-pro support Thinking Mode and non-thinking mode.

Should I use deepseek-v4-flash or deepseek-v4-pro for thinking?

Use deepseek-v4-flash for faster everyday reasoning and high-volume workloads. Use deepseek-v4-pro for harder reasoning, complex coding, long-context analysis, and multi-step planning.

What is reasoning_content?

reasoning_content is the API field that contains reasoning output in Thinking Mode. It is separate from content, which contains the final answer.

Should I show reasoning_content to users?

Usually no. For normal user-facing apps, show content and keep reasoning_content separate unless your product has a clear and safe policy for exposing reasoning traces.

What does reasoning_effort do?

reasoning_effort controls the reasoning effort used in Thinking Mode. Supported values are high and max.

Do temperature and top_p work in Thinking Mode?

No. In Thinking Mode, temperature, top_p, presence_penalty, and frequency_penalty do not affect output even if they are passed.

Can I stream reasoning_content?

Yes. Streaming responses can include delta.reasoning_content and delta.content separately. Your parser should keep them in separate buffers.

Can I use Tool Calls in Thinking Mode?

Yes. Tool Calls are supported in Thinking Mode. The model can request function calls, but your application must validate arguments, execute the function, and return the tool result.

Do I need to pass reasoning_content back in tool-call loops?

Yes. In thinking-mode tool-call loops, preserve and pass back the full assistant message internally, including reasoning_content where present. Stripping it can cause a 400-level error.

Can I use JSON Output in Thinking Mode?

Yes. Use response_format={"type": "json_object"}, include the word json in the prompt, provide an example shape, set max_tokens reasonably, and validate the parsed result.

Is FIM Completion supported in Thinking Mode?

No. FIM Completion is documented as non-thinking mode only.

Should I still use deepseek-chat or deepseek-reasoner?

For new code, use deepseek-v4-flash or deepseek-v4-pro. deepseek-chat and deepseek-reasoner are legacy compatibility aliases scheduled for discontinuation on 2026-07-24.

Where can I check DeepSeek API pricing?

Because DeepSeek API pricing can change, this guide does not copy token prices. Check the official DeepSeek pricing page and Chat-Deep.ai’s pricing guide before making billing decisions.

DeepSeek API snapshot — last verified April 28, 2026

Table of Contents

Who this guide is for

What is DeepSeek Thinking Mode?

Thinking Mode vs non-thinking mode

Which DeepSeek models support Thinking Mode?

How to enable or disable Thinking Mode

Install the Python package

Set your DeepSeek API key

macOS or Linux

Windows PowerShell

Basic client setup

Enable thinking

Disable thinking

reasoning_effort explained

reasoning_content vs content

Minimal Python example: Thinking Mode

Minimal Python example: non-thinking mode

Streaming Thinking Mode responses

Multi-turn conversations without tool calls

Tool Calls in Thinking Mode

JSON Output in Thinking Mode

Parameters that do not affect Thinking Mode

FIM Completion and Thinking Mode

Token usage, context caching, and cost control without prices

Cost-control habits without copying prices

Error handling and debugging

Thinking Mode debugging checklist

Security, privacy, and UI guidance for reasoning_content

Common mistakes

When this guide is not the right page

FAQ

What is DeepSeek Thinking Mode?

Is DeepSeek Thinking Mode enabled by default?

How do I disable Thinking Mode?

Which DeepSeek models support Thinking Mode?

Should I use deepseek-v4-flash or deepseek-v4-pro for thinking?

What is reasoning_content?

Should I show reasoning_content to users?

What does reasoning_effort do?

Do temperature and top_p work in Thinking Mode?

Can I stream reasoning_content?

Can I use Tool Calls in Thinking Mode?

Do I need to pass reasoning_content back in tool-call loops?

Can I use JSON Output in Thinking Mode?

Is FIM Completion supported in Thinking Mode?

Should I still use deepseek-chat or deepseek-reasoner?

Where can I check DeepSeek API pricing?