DeepSeek API Pricing: V4 Flash, V4 Pro, Cache Hit, Cache Miss & Output Costs

This page explains the current official DeepSeek API pricing for developers. It covers deepseek-v4-flash, deepseek-v4-pro, cache-hit input pricing, cache-miss input pricing, output token pricing, example calculations, and model-name migration notes.

Last verified by Chat-Deep.ai: April 26, 2026. The current primary DeepSeek API model IDs listed in the official API documentation are deepseek-v4-flash and deepseek-v4-pro. The older names deepseek-chat and deepseek-reasoner are compatibility aliases and are scheduled for deprecation on 2026/07/24.

Important pricing update: DeepSeek currently lists deepseek-v4-pro with a limited-time 75% discount until 2026/05/31 15:59 UTC. This page shows the active promotional price first and the regular listed price separately.

Independent note: Chat-Deep.ai is an independent website and is not affiliated with, endorsed by, or operated by DeepSeek. Official DeepSeek API prices can change, so always verify the latest public rates on the official DeepSeek pricing page before making production billing decisions.

Official DeepSeek API Pricing

The official DeepSeek API pricing table is listed per 1 million tokens. DeepSeek currently lists two V4 API models: deepseek-v4-flash for fast and economical usage, and deepseek-v4-pro for higher-value reasoning, coding, long-context, and agentic workflows.

ModelInput cache hitInput cache missOutputBest for
deepseek-v4-flash$0.0028 / 1M tokens$0.14 / 1M tokens$0.28 / 1M tokensEveryday chat, low-latency apps, routine coding help, summarization, extraction, classification, and cost-sensitive workloads.
deepseek-v4-proCurrent promo: $0.003625 / 1M tokens
Regular listed rate: $0.0145 / 1M tokens
Current promo: $0.435 / 1M tokens
Regular listed rate: $1.74 / 1M tokens
Current promo: $0.87 / 1M tokens
Regular listed rate: $3.48 / 1M tokens
Hard reasoning, complex coding, agentic workflows, long-context analysis, and higher-value production tasks.

Pricing is listed per 1M tokens and was checked against the official DeepSeek Models & Pricing page on . DeepSeek may adjust prices over time, so confirm current rates before production use.

Check the official DeepSeek Models & Pricing page for the latest public rates. DeepSeek states that product prices may vary and recommends checking the official page regularly.

Current DeepSeek API Model Details

The current official DeepSeek V4 API models are deepseek-v4-flash and deepseek-v4-pro. Both models use the OpenAI-compatible and Anthropic-compatible API formats, support thinking and non-thinking modes, and are listed with a 1M-token context length and a maximum output of 384K tokens.

Featuredeepseek-v4-flashdeepseek-v4-pro
Model versionDeepSeek-V4-FlashDeepSeek-V4-Pro
OpenAI-compatible base URLhttps://api.deepseek.comhttps://api.deepseek.com
Anthropic-compatible base URLhttps://api.deepseek.com/anthropichttps://api.deepseek.com/anthropic
Context length1M tokens1M tokens
Maximum output384K tokens384K tokens
Thinking modeSupportedSupported
Non-thinking modeSupportedSupported
JSON OutputSupportedSupported
Tool CallsSupportedSupported
Chat Prefix CompletionSupported, betaSupported, beta
FIM CompletionSupported in non-thinking mode only, betaSupported in non-thinking mode only, beta

How DeepSeek API Billing Works

DeepSeek API billing is usage-based. The practical request-level formula is:

Total cost =
(cache-hit input tokens / 1,000,000 × cache-hit rate) +
(cache-miss input tokens / 1,000,000 × cache-miss rate) +
(output tokens / 1,000,000 × output rate)

Because all listed rates are per 1 million tokens, divide each token count by 1,000,000 before multiplying by the matching rate.

Billing lineWhat it means
Input tokens, cache hitInput tokens served from DeepSeek context caching at the lower cache-hit price.
Input tokens, cache missInput tokens that require fresh processing and are billed at the higher cache-miss input price.
Output tokensTokens generated by the model in the response and billed at the selected model’s output rate.

DeepSeek states that expenses are deducted from your topped-up balance or granted balance, with granted balance used first when both are available. For production planning, review the rates on this page, then use the DeepSeek API guide for integration details, model selection, thinking-mode usage, and examples.

Cache Hit vs Cache Miss Pricing

DeepSeek context caching is enabled by default. A cache hit happens when part of a later request can reuse a persisted matching prefix from earlier requests. A cache miss happens when the input requires fresh processing.

Do not assume that every repeated request will receive cache-hit pricing. The official DeepSeek context caching documentation says cache hits depend on matching persisted prefixes and that the cache works on a best-effort basis. For accurate billing analysis, track prompt_cache_hit_tokens and prompt_cache_miss_tokens in the API response.

Usage fieldMeaning
prompt_cache_hit_tokensThe number of input tokens that received cache-hit pricing.
prompt_cache_miss_tokensThe number of input tokens that received cache-miss pricing.
completion_tokensThe number of generated output tokens billed at the output-token rate.

deepseek-v4-flash vs deepseek-v4-pro

Choose deepseek-v4-flash when speed and cost efficiency matter most. It is the better starting point for everyday AI chat, lightweight coding assistance, extraction, classification, summarization, and high-volume API workloads.

Choose deepseek-v4-pro when the task needs stronger reasoning or higher-quality output. It is better suited for complex coding, multi-step analysis, agentic workflows, difficult technical explanations, and long-context work where quality matters more than the lowest possible token price.

While the current deepseek-v4-pro promotion reduces its API price significantly, deepseek-v4-flash remains the lower-cost model across cache-hit input, cache-miss input, and output tokens.

Legacy Model Names: deepseek-chat and deepseek-reasoner

The older model names deepseek-chat and deepseek-reasoner are now legacy compatibility aliases. For compatibility, deepseek-chat currently routes to the non-thinking mode of deepseek-v4-flash, while deepseek-reasoner currently routes to the thinking mode of deepseek-v4-flash.

For new API integrations, use deepseek-v4-flash or deepseek-v4-pro directly. DeepSeek states that deepseek-chat and deepseek-reasoner are scheduled for deprecation on 2026/07/24.

Legacy nameCurrent compatibility behaviorRecommended replacement
deepseek-chatRoutes to deepseek-v4-flash non-thinking mode.deepseek-v4-flash with thinking disabled.
deepseek-reasonerRoutes to deepseek-v4-flash thinking mode.deepseek-v4-flash with thinking enabled for compatibility, or deepseek-v4-pro with thinking enabled for harder reasoning tasks.

DeepSeek API Pricing Examples

These examples are simplified estimates that assume no taxes, no extra infrastructure costs, no retries, and the current official API rates listed above. The deepseek-v4-pro examples use the active promotional price verified on April 26, 2026.

Example requestModelToken usageEstimated API cost
Small everyday chat responsedeepseek-v4-flash2,000 cache-miss input tokens + 1,000 output tokensAbout $0.00056
Longer low-cost summarizationdeepseek-v4-flash50,000 cache-miss input tokens + 5,000 output tokensAbout $0.0084
Complex reasoning or coding task during the current promotiondeepseek-v4-pro20,000 cache-miss input tokens + 5,000 output tokensAbout $0.01305 at the current promotional rate
About $0.0522 at the regular listed rate
Large cached prompt reusedeepseek-v4-flash100,000 cache-hit input tokens + 5,000 output tokensAbout $0.00168
Large cached high-quality analysis during the current promotiondeepseek-v4-pro100,000 cache-hit input tokens + 5,000 output tokensAbout $0.0047125 at the current promotional rate
About $0.01885 at the regular listed rate

Actual cost depends on the model selected, the number of cache-hit input tokens, the number of cache-miss input tokens, and the number of output tokens generated. For precise billing analysis, use the token usage fields returned by the API instead of estimating cache hits manually.

Estimate Your DeepSeek API Costs

Need a budget instead of a rate table? Use the DeepSeek API cost calculator to estimate per-request, daily, monthly, and yearly spend from your expected token volumes.

For the most accurate estimate, separate your input volume into cache-hit and cache-miss tokens, choose either deepseek-v4-flash or deepseek-v4-pro, then add your expected output token volume. Do not estimate new integrations using old V3.2 pricing or by treating deepseek-chat and deepseek-reasoner as separate current pricing models.

API Pricing vs Chat-Deep.ai Browser Chat

This page covers official DeepSeek API billing for developers. It does not describe a Chat-Deep.ai subscription plan.

Chat-Deep.ai offers a browser-based DeepSeek AI chat experience for normal use without requiring a Chat-Deep.ai account. That browser chat experience is different from official DeepSeek API usage, where developers pay based on model, input tokens, cache hits, cache misses, and output tokens.

TopicUse Chat-Deep.aiUse official DeepSeek API
Quick browser chatUse DeepSeek Chat online for a simple no-sign-up browser workflow.Not required unless you are building an app or integration.
API keys and billingChat-Deep.ai does not sell official DeepSeek API keys, credits, or billing plans.Use the official DeepSeek Platform for API keys, balance, and billing.
Production applicationsUse our API guide to understand the workflow and model choices.Use official DeepSeek API docs and official pricing for production decisions.

DeepSeek API Pricing FAQ

Why does deepseek-v4-pro have two prices?

DeepSeek shows both discounted and regular prices for deepseek-v4-pro. The discounted prices are currently active under a limited-time 75% promotion, while the crossed-out prices show the regular listed rates. Pricing also varies by token type: cache-hit input, cache-miss input, and output tokens. According to DeepSeek, the 75% discount is extended until 2026/05/31 15:59 UTC, and prices may change, so always verify costs on the official DeepSeek pricing page before estimating usage.

What is the cheapest current DeepSeek API model?

deepseek-v4-flash is the lower-cost current DeepSeek V4 API model. It is usually the best starting point for cost-sensitive applications and high-volume everyday workloads.

What is the difference between cache-hit and cache-miss pricing?

Cache-hit pricing applies when input tokens are served from DeepSeek context caching. Cache-miss pricing applies when input tokens require fresh processing. Output tokens are billed separately at the selected model’s output rate.

Are repeated prompts always billed at cache-hit pricing?

No. DeepSeek context caching works on a best-effort basis and depends on persisted matching prefixes. Track prompt_cache_hit_tokens and prompt_cache_miss_tokens in the API response to understand the real split for each request.

What do deepseek-chat and deepseek-reasoner refer to now?

They are legacy compatibility aliases. deepseek-chat currently routes to deepseek-v4-flash non-thinking mode, and deepseek-reasoner currently routes to deepseek-v4-flash thinking mode. New integrations should use deepseek-v4-flash or deepseek-v4-pro directly.

When will deepseek-chat and deepseek-reasoner be deprecated?

DeepSeek states that deepseek-chat and deepseek-reasoner are scheduled for deprecation on 2026/07/24. Developers should migrate to deepseek-v4-flash or deepseek-v4-pro before that date.

Does this page cover DeepSeek web or app pricing?

No. This page focuses on official DeepSeek API billing for developers. The official DeepSeek web and app experience is a separate product experience, so do not use API token prices to infer any web or app subscription details.

Does this page describe Chat-Deep.ai pricing?

No. This page explains official DeepSeek API token pricing. Chat-Deep.ai offers browser chat access for normal use without requiring a Chat-Deep.ai account, while official DeepSeek API billing is handled through the official DeepSeek Platform.

Where can I estimate DeepSeek API costs?

Use the DeepSeek API cost calculator to turn expected token volumes into a practical estimate. For the most accurate result, choose deepseek-v4-flash or deepseek-v4-pro, separate cache-hit and cache-miss input tokens, and verify the latest official rates before relying on an estimate for production budgeting.

Where should I verify the latest official API pricing?

Use the official DeepSeek Models & Pricing page. That page is the source of truth for current public API rates.