DeepSeek API Pricing: V4 Flash, V4 Pro, Cache Hit, Cache Miss & Output Costs
This page explains the current official DeepSeek API pricing for developers. It covers deepseek-v4-flash, deepseek-v4-pro, cache-hit input pricing, cache-miss input pricing, output token pricing, example calculations, and model-name migration notes.
Last verified by Chat-Deep.ai: April 26, 2026. The current primary DeepSeek API model IDs listed in the official API documentation are deepseek-v4-flash and deepseek-v4-pro. The older names deepseek-chat and deepseek-reasoner are compatibility aliases and are scheduled for deprecation on 2026/07/24.
Important pricing update: DeepSeek currently lists deepseek-v4-pro with a limited-time 75% discount until 2026/05/31 15:59 UTC. This page shows the active promotional price first and the regular listed price separately.
Independent note: Chat-Deep.ai is an independent website and is not affiliated with, endorsed by, or operated by DeepSeek. Official DeepSeek API prices can change, so always verify the latest public rates on the official DeepSeek pricing page before making production billing decisions.
Official DeepSeek API Pricing
The official DeepSeek API pricing table is listed per 1 million tokens. DeepSeek currently lists two V4 API models: deepseek-v4-flash for fast and economical usage, and deepseek-v4-pro for higher-value reasoning, coding, long-context, and agentic workflows.
| Model | Input cache hit | Input cache miss | Output | Best for |
|---|---|---|---|---|
deepseek-v4-flash | $0.0028 / 1M tokens | $0.14 / 1M tokens | $0.28 / 1M tokens | Everyday chat, low-latency apps, routine coding help, summarization, extraction, classification, and cost-sensitive workloads. |
deepseek-v4-pro | Current promo: $0.003625 / 1M tokens Regular listed rate: $0.0145 / 1M tokens | Current promo: $0.435 / 1M tokens Regular listed rate: $1.74 / 1M tokens | Current promo: $0.87 / 1M tokens Regular listed rate: $3.48 / 1M tokens | Hard reasoning, complex coding, agentic workflows, long-context analysis, and higher-value production tasks. |
Pricing is listed per 1M tokens and was checked against the official DeepSeek Models & Pricing page on . DeepSeek may adjust prices over time, so confirm current rates before production use.
Check the official DeepSeek Models & Pricing page for the latest public rates. DeepSeek states that product prices may vary and recommends checking the official page regularly.
Current DeepSeek API Model Details
The current official DeepSeek V4 API models are deepseek-v4-flash and deepseek-v4-pro. Both models use the OpenAI-compatible and Anthropic-compatible API formats, support thinking and non-thinking modes, and are listed with a 1M-token context length and a maximum output of 384K tokens.
| Feature | deepseek-v4-flash | deepseek-v4-pro |
|---|---|---|
| Model version | DeepSeek-V4-Flash | DeepSeek-V4-Pro |
| OpenAI-compatible base URL | https://api.deepseek.com | https://api.deepseek.com |
| Anthropic-compatible base URL | https://api.deepseek.com/anthropic | https://api.deepseek.com/anthropic |
| Context length | 1M tokens | 1M tokens |
| Maximum output | 384K tokens | 384K tokens |
| Thinking mode | Supported | Supported |
| Non-thinking mode | Supported | Supported |
| JSON Output | Supported | Supported |
| Tool Calls | Supported | Supported |
| Chat Prefix Completion | Supported, beta | Supported, beta |
| FIM Completion | Supported in non-thinking mode only, beta | Supported in non-thinking mode only, beta |
How DeepSeek API Billing Works
DeepSeek API billing is usage-based. The practical request-level formula is:
Total cost = (cache-hit input tokens / 1,000,000 × cache-hit rate) + (cache-miss input tokens / 1,000,000 × cache-miss rate) + (output tokens / 1,000,000 × output rate)
Because all listed rates are per 1 million tokens, divide each token count by 1,000,000 before multiplying by the matching rate.
| Billing line | What it means |
|---|---|
| Input tokens, cache hit | Input tokens served from DeepSeek context caching at the lower cache-hit price. |
| Input tokens, cache miss | Input tokens that require fresh processing and are billed at the higher cache-miss input price. |
| Output tokens | Tokens generated by the model in the response and billed at the selected model’s output rate. |
DeepSeek states that expenses are deducted from your topped-up balance or granted balance, with granted balance used first when both are available. For production planning, review the rates on this page, then use the DeepSeek API guide for integration details, model selection, thinking-mode usage, and examples.
Cache Hit vs Cache Miss Pricing
DeepSeek context caching is enabled by default. A cache hit happens when part of a later request can reuse a persisted matching prefix from earlier requests. A cache miss happens when the input requires fresh processing.
Do not assume that every repeated request will receive cache-hit pricing. The official DeepSeek context caching documentation says cache hits depend on matching persisted prefixes and that the cache works on a best-effort basis. For accurate billing analysis, track prompt_cache_hit_tokens and prompt_cache_miss_tokens in the API response.
| Usage field | Meaning |
|---|---|
prompt_cache_hit_tokens | The number of input tokens that received cache-hit pricing. |
prompt_cache_miss_tokens | The number of input tokens that received cache-miss pricing. |
completion_tokens | The number of generated output tokens billed at the output-token rate. |
deepseek-v4-flash vs deepseek-v4-pro
Choose deepseek-v4-flash when speed and cost efficiency matter most. It is the better starting point for everyday AI chat, lightweight coding assistance, extraction, classification, summarization, and high-volume API workloads.
Choose deepseek-v4-pro when the task needs stronger reasoning or higher-quality output. It is better suited for complex coding, multi-step analysis, agentic workflows, difficult technical explanations, and long-context work where quality matters more than the lowest possible token price.
While the current deepseek-v4-pro promotion reduces its API price significantly, deepseek-v4-flash remains the lower-cost model across cache-hit input, cache-miss input, and output tokens.
Legacy Model Names: deepseek-chat and deepseek-reasoner
The older model names deepseek-chat and deepseek-reasoner are now legacy compatibility aliases. For compatibility, deepseek-chat currently routes to the non-thinking mode of deepseek-v4-flash, while deepseek-reasoner currently routes to the thinking mode of deepseek-v4-flash.
For new API integrations, use deepseek-v4-flash or deepseek-v4-pro directly. DeepSeek states that deepseek-chat and deepseek-reasoner are scheduled for deprecation on 2026/07/24.
| Legacy name | Current compatibility behavior | Recommended replacement |
|---|---|---|
deepseek-chat | Routes to deepseek-v4-flash non-thinking mode. | deepseek-v4-flash with thinking disabled. |
deepseek-reasoner | Routes to deepseek-v4-flash thinking mode. | deepseek-v4-flash with thinking enabled for compatibility, or deepseek-v4-pro with thinking enabled for harder reasoning tasks. |
DeepSeek API Pricing Examples
These examples are simplified estimates that assume no taxes, no extra infrastructure costs, no retries, and the current official API rates listed above. The deepseek-v4-pro examples use the active promotional price verified on April 26, 2026.
| Example request | Model | Token usage | Estimated API cost |
|---|---|---|---|
| Small everyday chat response | deepseek-v4-flash | 2,000 cache-miss input tokens + 1,000 output tokens | About $0.00056 |
| Longer low-cost summarization | deepseek-v4-flash | 50,000 cache-miss input tokens + 5,000 output tokens | About $0.0084 |
| Complex reasoning or coding task during the current promotion | deepseek-v4-pro | 20,000 cache-miss input tokens + 5,000 output tokens | About $0.01305 at the current promotional rate About $0.0522 at the regular listed rate |
| Large cached prompt reuse | deepseek-v4-flash | 100,000 cache-hit input tokens + 5,000 output tokens | About $0.00168 |
| Large cached high-quality analysis during the current promotion | deepseek-v4-pro | 100,000 cache-hit input tokens + 5,000 output tokens | About $0.0047125 at the current promotional rate About $0.01885 at the regular listed rate |
Actual cost depends on the model selected, the number of cache-hit input tokens, the number of cache-miss input tokens, and the number of output tokens generated. For precise billing analysis, use the token usage fields returned by the API instead of estimating cache hits manually.
Estimate Your DeepSeek API Costs
Need a budget instead of a rate table? Use the DeepSeek API cost calculator to estimate per-request, daily, monthly, and yearly spend from your expected token volumes.
For the most accurate estimate, separate your input volume into cache-hit and cache-miss tokens, choose either deepseek-v4-flash or deepseek-v4-pro, then add your expected output token volume. Do not estimate new integrations using old V3.2 pricing or by treating deepseek-chat and deepseek-reasoner as separate current pricing models.
API Pricing vs Chat-Deep.ai Browser Chat
This page covers official DeepSeek API billing for developers. It does not describe a Chat-Deep.ai subscription plan.
Chat-Deep.ai offers a browser-based DeepSeek AI chat experience for normal use without requiring a Chat-Deep.ai account. That browser chat experience is different from official DeepSeek API usage, where developers pay based on model, input tokens, cache hits, cache misses, and output tokens.
| Topic | Use Chat-Deep.ai | Use official DeepSeek API |
|---|---|---|
| Quick browser chat | Use DeepSeek Chat online for a simple no-sign-up browser workflow. | Not required unless you are building an app or integration. |
| API keys and billing | Chat-Deep.ai does not sell official DeepSeek API keys, credits, or billing plans. | Use the official DeepSeek Platform for API keys, balance, and billing. |
| Production applications | Use our API guide to understand the workflow and model choices. | Use official DeepSeek API docs and official pricing for production decisions. |
DeepSeek API Pricing FAQ
Why does deepseek-v4-pro have two prices?
DeepSeek shows both discounted and regular prices for deepseek-v4-pro. The discounted prices are currently active under a limited-time 75% promotion, while the crossed-out prices show the regular listed rates. Pricing also varies by token type: cache-hit input, cache-miss input, and output tokens. According to DeepSeek, the 75% discount is extended until 2026/05/31 15:59 UTC, and prices may change, so always verify costs on the official DeepSeek pricing page before estimating usage.
What is the cheapest current DeepSeek API model?
deepseek-v4-flash is the lower-cost current DeepSeek V4 API model. It is usually the best starting point for cost-sensitive applications and high-volume everyday workloads.
What is the difference between cache-hit and cache-miss pricing?
Cache-hit pricing applies when input tokens are served from DeepSeek context caching. Cache-miss pricing applies when input tokens require fresh processing. Output tokens are billed separately at the selected model’s output rate.
Are repeated prompts always billed at cache-hit pricing?
No. DeepSeek context caching works on a best-effort basis and depends on persisted matching prefixes. Track prompt_cache_hit_tokens and prompt_cache_miss_tokens in the API response to understand the real split for each request.
What do deepseek-chat and deepseek-reasoner refer to now?
They are legacy compatibility aliases. deepseek-chat currently routes to deepseek-v4-flash non-thinking mode, and deepseek-reasoner currently routes to deepseek-v4-flash thinking mode. New integrations should use deepseek-v4-flash or deepseek-v4-pro directly.
When will deepseek-chat and deepseek-reasoner be deprecated?
DeepSeek states that deepseek-chat and deepseek-reasoner are scheduled for deprecation on 2026/07/24. Developers should migrate to deepseek-v4-flash or deepseek-v4-pro before that date.
Does this page cover DeepSeek web or app pricing?
No. This page focuses on official DeepSeek API billing for developers. The official DeepSeek web and app experience is a separate product experience, so do not use API token prices to infer any web or app subscription details.
Does this page describe Chat-Deep.ai pricing?
No. This page explains official DeepSeek API token pricing. Chat-Deep.ai offers browser chat access for normal use without requiring a Chat-Deep.ai account, while official DeepSeek API billing is handled through the official DeepSeek Platform.
Where can I estimate DeepSeek API costs?
Use the DeepSeek API cost calculator to turn expected token volumes into a practical estimate. For the most accurate result, choose deepseek-v4-flash or deepseek-v4-pro, separate cache-hit and cache-miss input tokens, and verify the latest official rates before relying on an estimate for production budgeting.
Where should I verify the latest official API pricing?
Use the official DeepSeek Models & Pricing page. That page is the source of truth for current public API rates.
Recommended Next Pages
- DeepSeek API Cost Calculator — estimate per-request, daily, monthly, and yearly costs.
- DeepSeek API Guide — learn setup, API calls, model selection, and examples.
- DeepSeek Context Caching — understand cache hits, cache misses, and prompt reuse.
- DeepSeek Models — compare V4 Flash, V4 Pro, R1, V3.2, Coder, OCR, and related model families.
- DeepSeek Chat Online — use Chat-Deep.ai’s no-sign-up browser chat experience.
