DeepSeek V4 Guide: Pro vs Flash, 1M Context & API Pricing

DeepSeek V4 Preview is the current official DeepSeek V4 release line. It includes two current API model IDs, deepseek-v4-pro and deepseek-v4-flash, with 1M-token context, 384K maximum output, thinking and non-thinking modes, OpenAI-compatible API access, Anthropic-compatible API access, and official open-weight repositories.

Last verified against official DeepSeek sources: April 26, 2026.

Independent site notice: Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, chat.deepseek.com, the official DeepSeek app, or the official DeepSeek developer platform. For production decisions, always verify model names, prices, limits, feature support, status, and deprecation notices in the official DeepSeek API documentation.

Current official V4 status: DeepSeek-V4 Preview is live, API-available, and open-sourced. New API integrations should use deepseek-v4-pro or deepseek-v4-flash.

Pricing rule for this page: this guide does not publish fixed token prices. API prices and promotions can change, so all pricing references point readers to the official DeepSeek Models & Pricing page.

Quick Answer: Is DeepSeek V4 Released?

Yes. DeepSeek V4 has launched as DeepSeek-V4 Preview. The safest wording is Preview Release, not final release. DeepSeek describes the preview as officially live, open-sourced, and available through the API using deepseek-v4-pro and deepseek-v4-flash.

DeepSeek-V4-Pro is the larger flagship V4 model with 1.6T total parameters and 49B active parameters. DeepSeek-V4-Flash is the faster and more economical V4 model with 284B total parameters and 13B active parameters. Both current V4 API models are documented with a 1M-token context window and 384K maximum output.

DeepSeek V4 Key Takeaways

Release status: DeepSeek V4 is live as DeepSeek-V4 Preview.
Current API model names: use deepseek-v4-pro or deepseek-v4-flash for new integrations.
API base URLs: use https://api.deepseek.com for OpenAI-compatible requests and https://api.deepseek.com/anthropic for Anthropic-compatible requests.
Context and output: both V4 API models are listed with 1M context and 384K maximum output.
Model roles: V4-Flash is the fast and economical model; V4-Pro is the stronger model for harder reasoning, coding, long-context analysis, and agentic workflows.
Pricing: do not hardcode prices from this guide. Use the official DeepSeek Models & Pricing page.
Legacy aliases: deepseek-chat and deepseek-reasoner currently route to V4-Flash non-thinking and thinking modes and are scheduled to be retired after July 24, 2026, 15:59 UTC.
Open weights: DeepSeek published V4 model repositories through the official DeepSeek-V4 Hugging Face collection.

DeepSeek V4 Guide Contents

Release Date and Current Status
DeepSeek V4 at a Glance
Is DeepSeek V4 a 1T-Parameter Model?
DeepSeek V4 Pro vs Flash
Which DeepSeek V4 Model Should You Use?
API Model Names and Base URLs
deepseek-chat and deepseek-reasoner Migration
API Examples
DeepSeek V4 API Pricing Source
Usage Tracking and Cost Control Without Hardcoded Prices
1M Context and 384K Output
Open Weights and MIT License
Architecture and Benchmark Highlights
Thinking Modes, JSON, Tool Calls and Agents
What Changed from DeepSeek V3.2 to V4?
Developer Migration Checklist
What Not to Overclaim
FAQ
Sources

DeepSeek V4 Release Date and Current Status

DeepSeek V4 Preview was announced on April 24, 2026. DeepSeek’s official release note says DeepSeek-V4 Preview is live and open-sourced, and that the API is available now. It also says developers can keep the same base URL and update the model parameter to deepseek-v4-pro or deepseek-v4-flash.

For fast-moving launch updates, outages, or later API changes, check the official DeepSeek change log, the official DeepSeek Service Status page, and the local DeepSeek Status guide.

DeepSeek V4 at a Glance

Topic	Current DeepSeek V4 Detail
Release name	DeepSeek-V4 Preview
Release date	April 24, 2026
Main API models	`deepseek-v4-pro` and `deepseek-v4-flash`
OpenAI-compatible base URL	`https://api.deepseek.com`
Anthropic-compatible base URL	`https://api.deepseek.com/anthropic`
V4-Pro size	1.6T total parameters / 49B active parameters
V4-Flash size	284B total parameters / 13B active parameters
Context length	1M tokens
Maximum output	384K tokens
Modes	Thinking and non-thinking modes
Supported API features	JSON Output, Tool Calls, Chat Prefix Completion Beta, and FIM Completion Beta in non-thinking mode only
Pricing source	Official DeepSeek Models & Pricing
Open weights	Published through DeepSeek’s official Hugging Face collection
Legacy aliases	`deepseek-chat` and `deepseek-reasoner` route to V4-Flash compatibility modes during the transition period

Is DeepSeek V4 a 1T-Parameter Model?

You may see DeepSeek V4 described online as a “1T parameter model,” but that is only rough shorthand. The official V4 details are more precise:

DeepSeek-V4-Pro: 1.6T total parameters and 49B active parameters.
DeepSeek-V4-Flash: 284B total parameters and 13B active parameters.

For accurate technical content, avoid saying “DeepSeek V4 is a 1T model” as if there is only one V4 model. The better wording is: DeepSeek V4 Preview includes V4-Pro at 1.6T total parameters and V4-Flash at 284B total parameters.

DeepSeek V4 Pro vs Flash: Key Differences

Feature	DeepSeek-V4-Pro	DeepSeek-V4-Flash
API model name	`deepseek-v4-pro`	`deepseek-v4-flash`
Total parameters	1.6T	284B
Activated parameters	49B	13B
Context length	1M tokens	1M tokens
Maximum output	384K tokens	384K tokens
Thinking modes	Non-thinking, Think High, and Think Max through effort controls	Non-thinking, Think High, and Think Max through effort controls
Best for	Advanced reasoning, difficult coding, long-context analysis, agentic workflows, and higher-value production tests	Fast chat, summaries, extraction, support assistants, simpler agents, and high-volume workloads
Pricing source	Verify current official pricing	Verify current official pricing
Simple rule	Use when answer quality and reasoning depth matter most.	Use as the default starting point, then escalate difficult tasks to Pro.

The safest practical summary is: DeepSeek-V4-Pro is the stronger flagship model; DeepSeek-V4-Flash is the faster and more economical model. Start with Flash for normal workloads, then route more complex reasoning, coding, agent, and long-context tasks to Pro when quality justifies the switch.

Which DeepSeek V4 Model Should You Use?

Use this decision table before choosing between deepseek-v4-pro and deepseek-v4-flash.

Use Case	Recommended Model	Why
Everyday chat, quick answers, rewriting, basic explanations	`deepseek-v4-flash`	Fast and economical enough for most routine outputs.
Customer support bot with many conversations	`deepseek-v4-flash`	Good default for high-volume workflows where speed and efficiency matter.
Extraction, classification, structured summaries	`deepseek-v4-flash`	Start with the efficient model and validate output quality against your schema.
Large document summarization	Start with `deepseek-v4-flash`, escalate to `deepseek-v4-pro` for complex synthesis	Both support 1M context, but Pro may be better for harder reasoning across documents.
Code review, debugging, complex software planning	`deepseek-v4-pro`	Better fit for higher-value coding and reasoning tasks.
Agentic coding, tool use, multi-step workflows	`deepseek-v4-pro` for hard tasks; Flash for simple agent steps	This balances capability, latency, and budget control.
Production system with mixed complexity	Use a router: Flash first, Pro for escalation	Route by task difficulty instead of forcing every request through one model.
Strict budget control	Use `deepseek-v4-flash` first and verify official pricing before scale-up	Do not estimate budget from old or copied pricing snippets.

DeepSeek V4 API Model Names and Base URLs

The official V4 API model names are:

deepseek-v4-pro
deepseek-v4-flash

The OpenAI-compatible base URL remains:

https://api.deepseek.com

The Anthropic-compatible base URL is:

https://api.deepseek.com/anthropic

For most OpenAI-compatible tooling, you can keep your base URL and update the model name. For Claude Code or Anthropic-compatible ecosystems, use the Anthropic-compatible base URL and a current V4 model name. See the local DeepSeek API guide for a broader integration walkthrough.

What Happens to deepseek-chat and deepseek-reasoner?

The legacy names deepseek-chat and deepseek-reasoner are no longer the best model names for new V4 API integrations. During the current transition period:

deepseek-chat currently routes to the non-thinking mode of DeepSeek-V4-Flash.
deepseek-reasoner currently routes to the thinking mode of DeepSeek-V4-Flash.

DeepSeek’s V4 release note says these two legacy API model names will be fully retired and inaccessible after July 24, 2026, 15:59 UTC. New integrations should use deepseek-v4-pro or deepseek-v4-flash directly.

Name	Status	Current Mapping	Recommended Action
`deepseek-v4-pro`	Current V4 API model	DeepSeek-V4-Pro	Use for stronger reasoning, coding, long-context, and agentic workloads.
`deepseek-v4-flash`	Current V4 API model	DeepSeek-V4-Flash	Use for fast, economical, high-volume workloads.
`deepseek-chat`	Legacy compatibility alias	V4-Flash non-thinking mode	Replace with `deepseek-v4-flash` unless you intentionally need temporary compatibility.
`deepseek-reasoner`	Legacy compatibility alias	V4-Flash thinking mode	Replace with `deepseek-v4-flash` or `deepseek-v4-pro` plus thinking settings.

DeepSeek V4 API Examples

OpenAI-Compatible cURL Example

This example uses the current V4 model name directly:

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_DEEPSEEK_API_KEY" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the difference between DeepSeek V4 Pro and Flash."}
    ],
    "reasoning_effort": "high",
    "thinking": {"type": "enabled"},
    "stream": false
  }'

OpenAI SDK Python Example

When using the OpenAI SDK, DeepSeek-specific fields such as thinking should be passed through extra_body.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "user", "content": "Create a migration checklist for DeepSeek V4."}
    ],
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}}
)

print(response.choices[0].message.content)

Anthropic-Compatible Example

DeepSeek also supports an Anthropic-compatible API format through a separate base URL:

export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_API_KEY=YOUR_DEEPSEEK_API_KEY

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="deepseek-v4-pro",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[
        {
            "role": "user",
            "content": [{"type": "text", "text": "Summarize DeepSeek V4 in 5 bullets."}]
        }
    ]
)

print(message.content)

For fast, economical workloads, replace deepseek-v4-pro with deepseek-v4-flash. Always keep API keys server-side and out of public code, screenshots, analytics, or browser bundles.

DeepSeek V4 API Pricing Source

This page intentionally does not publish fixed DeepSeek token prices. DeepSeek API pricing is a live billing detail, and it can change because of launch pricing, promotions, discounts, product updates, or future model changes.

Official pricing source: DeepSeek Models & Pricing.

Use that official page for current token prices, any limited-time promotions, context length, maximum output, feature matrix, and model availability. Use this Chat-Deep.ai page for model selection and migration guidance, not as the final billing source.

If you are planning an API product, you can also use the local DeepSeek API pricing guide and DeepSeek API cost calculator for explanation and budgeting workflow, then confirm the actual rates against DeepSeek’s official pricing page before shipping.

Usage Tracking and Cost Control Without Hardcoded Prices

Even without hardcoding prices in this article, developers still need a reliable way to estimate and control API spend. The safe approach is to track token categories and apply the current official rates from DeepSeek’s pricing page at the time of calculation.

The general calculation pattern is:

Estimated request cost =
  cache_hit_input_tokens × current official cache-hit input rate
+ cache_miss_input_tokens × current official cache-miss input rate
+ output_tokens × current official output rate

Track these fields in production where available:

usage.prompt_tokens
usage.prompt_cache_hit_tokens
usage.prompt_cache_miss_tokens
usage.completion_tokens
usage.completion_tokens_details.reasoning_tokens
usage.total_tokens

Cost control is not only a pricing-table issue. It also depends on model routing, prompt length, retrieved context size, cache-hit rate, output limits, thinking effort, retries, tool loops, and how often you escalate from V4-Flash to V4-Pro.

DeepSeek V4 Context Length and Max Output

The official V4 API documentation lists a 1M-token context length and a 384K maximum output for the current V4 API models.

A 1M-token context window can help with long-document analysis, transcript processing, codebase review, research workflows, legal or technical document review, and agents that need a large working memory. However, a larger context window does not automatically guarantee better answers. You should still test retrieval accuracy, prompt structure, latency, cost, tool calls, and structured output behavior before moving production workloads to V4.

DeepSeek V4 Open Weights and MIT License

DeepSeek published an official DeepSeek-V4 Hugging Face collection. The DeepSeek-V4-Pro model card states that the repository and model weights are licensed under the MIT License.

Model	Total Parameters	Activated Parameters	Context Length	Precision	Official Repository
DeepSeek-V4-Flash-Base	284B	13B	1M	FP8 Mixed	Hugging Face
DeepSeek-V4-Flash	284B	13B	1M	FP4 + FP8 Mixed	Hugging Face
DeepSeek-V4-Pro-Base	1.6T	49B	1M	FP8 Mixed	Hugging Face
DeepSeek-V4-Pro	1.6T	49B	1M	FP4 + FP8 Mixed	Hugging Face

The official model card explains that FP4 + FP8 mixed precision uses FP4 for MoE expert parameters and FP8 for most other parameters. Open weights are useful for research and infrastructure teams, but local deployment still requires serious hardware, careful inference setup, and close reading of the official model card.

DeepSeek V4 Architecture and Benchmark Highlights

DeepSeek describes V4 as a preview series of Mixture-of-Experts language models. The V4 model card lists several architecture and optimization upgrades:

Mixture-of-Experts architecture: only a subset of parameters is activated per token, while the model keeps large total capacity.
Hybrid Attention Architecture: DeepSeek combines Compressed Sparse Attention and Heavily Compressed Attention to improve long-context efficiency.
Manifold-Constrained Hyper-Connections: mHC is used to strengthen residual connections and improve training stability.
Muon optimizer: DeepSeek says it uses the Muon optimizer for faster convergence and stability.
Large-scale pretraining: DeepSeek says V4 models were pretrained on more than 32T diverse and high-quality tokens.

DeepSeek-Reported Benchmark Highlights

The numbers below are DeepSeek-reported model-card results. Treat them as vendor-reported benchmarks, not a replacement for your own production tests.

Benchmark / Metric	DeepSeek-V4-Pro Max	Why It Matters
GPQA Diamond	90.1	Advanced reasoning and science-heavy QA.
LiveCodeBench	93.5	Coding performance under benchmark conditions.
SWE Verified	80.6	Software engineering task resolution.
Terminal Bench 2.0	67.9	Agentic command-line and terminal workflows.
MRCR 1M	83.5	Long-context reasoning at 1M context scale.

For real applications, test DeepSeek V4 against your own prompts, data, safety constraints, latency requirements, tool-calling patterns, budget limits, and acceptance criteria.

Thinking Modes, Agentic Coding, JSON Output and Tool Calls

DeepSeek V4 supports both thinking and non-thinking modes. In the OpenAI-compatible format, the thinking toggle uses {"thinking": {"type": "enabled"}} or {"thinking": {"type": "disabled"}}. Thinking effort can be controlled with reasoning_effort.

Feature	DeepSeek V4 Support	Implementation Note
Thinking mode	Supported	Use `thinking` and `reasoning_effort` settings.
Non-thinking mode	Supported	Useful for faster routine tasks.
JSON Output	Supported	Set `response_format`, include the word “json” in the prompt, and provide the target shape.
Tool / Function calling	Supported	Useful for agents, external APIs, and structured workflows. Validate tool arguments before execution.
Chat Prefix Completion	Supported, Beta	Use official Beta documentation before production.
FIM Completion	Supported in non-thinking mode only, Beta	Useful for code completion workflows.
Anthropic API format	Supported	Use `https://api.deepseek.com/anthropic`.

Important implementation note: DeepSeek’s thinking guide says thinking mode does not support temperature, top_p, presence_penalty, or frequency_penalty. For compatibility, setting those parameters may not trigger an error, but they also may have no effect in thinking mode.

What Changed from DeepSeek V3.2 to DeepSeek V4 Preview?

DeepSeek V3.2 remains important historically, but it should not be described as the current hosted API mapping after the V4 Preview update. Current V4 API content should use the V4 model names directly.

Topic	DeepSeek V3.2 Historical Position	DeepSeek V4 Preview Position
Release status	Released in December 2025 as a prior open-weight model generation.	Released on April 24, 2026 as DeepSeek-V4 Preview.
Hosted API mapping	Historically mapped to `deepseek-chat` and `deepseek-reasoner` after the V3.2 release.	Current V4 API models are `deepseek-v4-pro` and `deepseek-v4-flash`.
Legacy aliases	Older articles may describe aliases as V3.2.	The aliases now route to V4-Flash modes during the transition period.
Context length	Often discussed around earlier 128K-context API content.	Official V4 API documentation lists 1M context.
Pricing	Older V3.2 pricing snippets should be treated as historical.	Use the current official V4 pricing page before production use.

Developer Migration Checklist for DeepSeek V4

Replace new production uses of deepseek-chat with deepseek-v4-flash where Flash is suitable.
Use deepseek-v4-pro for higher-value reasoning, coding, long-context, and agentic workloads.
Audit documentation, SDK wrappers, environment variables, examples, and internal dashboards for old model names.
Remove static pricing snippets from evergreen pages and link to the official DeepSeek pricing page instead.
Track cache-hit input tokens, cache-miss input tokens, output tokens, and reasoning tokens where available.
Test thinking mode and reasoning_effort before production rollout.
Validate JSON Output and tool/function calling behavior with your own schemas.
Measure latency separately for short prompts, long prompts, and 1M-context workflows.
Use feature flags or staged rollout so you can compare V4-Flash and V4-Pro safely.
Check the official DeepSeek change log before the July 24, 2026 legacy-alias retirement date.
Update internal content so no current page says V3.2 is the current hosted API model.

Copy-Paste Migration Mapping

Old model: deepseek-chat
Current recommended replacement: deepseek-v4-flash
Reason: deepseek-chat is a legacy alias currently routing to V4-Flash non-thinking mode.

Old model: deepseek-reasoner
Current recommended replacement: deepseek-v4-flash or deepseek-v4-pro with thinking enabled
Reason: deepseek-reasoner is a legacy alias currently routing to V4-Flash thinking mode.

High-value reasoning / coding / long-context tasks:
Use: deepseek-v4-pro

High-volume chat / support / summaries / cheaper agent steps:
Use: deepseek-v4-flash

What Not to Overclaim About DeepSeek V4

Avoid This Claim	Use This Safer Wording
“DeepSeek V4 final version has launched.”	“DeepSeek V4 has launched as a Preview Release.”
“DeepSeek V4 is a 1T-parameter model.”	“DeepSeek V4 Preview includes V4-Pro at 1.6T total parameters and V4-Flash at 284B total parameters.”
“deepseek-chat currently means DeepSeek V3.2.”	“deepseek-chat is now a legacy alias that currently routes to V4-Flash non-thinking mode.”
“deepseek-reasoner currently means DeepSeek V3.2.”	“deepseek-reasoner is now a legacy alias that currently routes to V4-Flash thinking mode.”
“DeepSeek-V4-Pro-Max is a separate API model.”	“DeepSeek-V4-Pro-Max is best described as the maximum reasoning effort mode of DeepSeek-V4-Pro unless DeepSeek documents it as a separate API model name.”
“DeepSeek V4 pricing will not change.”	“Verify the official DeepSeek Models & Pricing page before production budgeting.”
“Every open DeepSeek model has the same license.”	“Check the exact official model card or repository for the model you plan to use.”
“The hosted API, web app, and open weights are always identical.”	“Treat hosted API behavior, web/app behavior, and local open-weight deployment as separate surfaces.”

For more context, see the DeepSeek AI guide, DeepSeek Chat page, DeepSeek API guide, DeepSeek API pricing guide, DeepSeek API cost calculator, DeepSeek Models hub, DeepSeek V3.2 historical guide, and DeepSeek Status guide.

DeepSeek V4 FAQ

Is DeepSeek V4 officially released?

Yes. DeepSeek V4 has officially launched as DeepSeek-V4 Preview on April 24, 2026. The correct wording is Preview Release, not final release.

What are the official DeepSeek V4 API model names?

The official V4 API model names are deepseek-v4-pro and deepseek-v4-flash. New integrations should use these names directly.

What is DeepSeek-V4-Pro?

DeepSeek-V4-Pro is the larger V4 Preview model. DeepSeek lists it as 1.6T total parameters with 49B activated parameters and positions it for advanced reasoning, coding, knowledge-heavy, long-context, and agentic workloads.

What is DeepSeek-V4-Flash?

DeepSeek-V4-Flash is the smaller, faster, and more economical V4 Preview model. DeepSeek lists it as 284B total parameters with 13B activated parameters.

Is DeepSeek V4 a 1T-parameter model?

Not exactly. The precise official figures are 1.6T total parameters for V4-Pro and 284B total parameters for V4-Flash. Calling DeepSeek V4 simply a “1T model” is less accurate.

What is the DeepSeek V4 context length?

The official V4 API documentation lists a 1M-token context length for the current V4 API models.

What is the DeepSeek V4 maximum output limit?

The official DeepSeek Models & Pricing page lists a maximum output limit of 384K tokens for the current V4 API models.

Where should I verify DeepSeek V4 API pricing?

Verify current prices on the official DeepSeek Models & Pricing page. This guide avoids fixed prices because rates and promotions can change.

Is DeepSeek V4 open source?

DeepSeek describes V4 Preview as open-sourced and published official V4 model repositories through Hugging Face. The DeepSeek-V4-Pro model card states that the repository and model weights are licensed under the MIT License. Always check the exact model card before self-hosting or commercial deployment.

Does deepseek-chat still mean DeepSeek V3.2?

No. After the V4 Preview update, deepseek-chat is a legacy compatibility alias that currently routes to DeepSeek-V4-Flash non-thinking mode.

Does deepseek-reasoner still mean DeepSeek V3.2?

No. After the V4 Preview update, deepseek-reasoner is a legacy compatibility alias that currently routes to DeepSeek-V4-Flash thinking mode.

Is DeepSeek-V4-Pro-Max a separate API model?

For API documentation, the safe wording is that DeepSeek-V4-Pro-Max is the maximum reasoning effort mode of DeepSeek-V4-Pro unless DeepSeek documents it as a separate API model name. For new API calls, use deepseek-v4-pro or deepseek-v4-flash.

Is Chat-Deep.ai the official DeepSeek website?

No. Chat-Deep.ai is an independent DeepSeek guide and browser access site. It is not affiliated with DeepSeek, DeepSeek.com, chat.deepseek.com, the official DeepSeek app, or the official DeepSeek developer platform.