Last verified: April 27, 2026.
Quick answer :
You can run DeepSeek locally on Windows, macOS, or Linux with Ollama. For most users, the simplest setup is: install Ollama, pulldeepseek-r1:8b, and run it from the terminal. If you want a browser-based chat interface, add Open WebUI on top. If you prefer a desktop app and GGUF model browsing, use our DeepSeek in LM Studio guide.
Important naming note: this article is about running local open-weight DeepSeek models through Ollama. It is not the same as calling DeepSeek’s hosted API. The current official DeepSeek API model IDs are deepseek-v4-flash and deepseek-v4-pro. The older API names deepseek-chat and deepseek-reasoner are legacy aliases and are scheduled to be fully retired and inaccessible after July 24, 2026, 15:59 UTC. For local Ollama usage, you normally use tags such as deepseek-r1:8b, not the hosted API model IDs
This guide is part of the broader Chat-Deep.ai DeepSeek AI guide hub. Chat-Deep.ai is an independent resource and is not the official DeepSeek website. The goal of this page is to help you install and run DeepSeek-style local models safely while keeping the model naming consistent with the rest of our DeepSeek documentation.
What this guide covers
This walkthrough is for users who want local DeepSeek inference for privacy, offline use after download, local development, testing, or self-hosted experimentation. It covers:
- How to understand the difference between DeepSeek API models and local Ollama tags
- How to choose a realistic DeepSeek-R1 model size for your hardware
- How to install Ollama on Windows, macOS, and Linux
- How to download and run DeepSeek locally with Ollama
- How to add Open WebUI for a browser chat interface
- How to call your local DeepSeek model through the Ollama API
- How to troubleshoot common installation, performance, and privacy issues
Before you start: API names and local model names are different
A common mistake is mixing hosted DeepSeek API model IDs with local model tags. They are related to the broader DeepSeek ecosystem, but they are not interchangeable.
| Where you are using DeepSeek | Use this kind of name | Example |
|---|---|---|
| Official hosted DeepSeek API | Official API model ID | deepseek-v4-flash or deepseek-v4-pro |
| Ollama local runtime | Ollama model tag | deepseek-r1:8b, deepseek-r1:14b, or another Ollama tag |
| LM Studio local runtime | LM Studio local model identifier | The exact model ID shown in LM Studio after download/import |
| Hugging Face model page | Repository/checkpoint name | deepseek-ai/DeepSeek-R1-Distill-Llama-8B or another official checkpoint |
If you put deepseek-v4-pro into an Ollama command, Ollama will not treat it as a local DeepSeek-R1 tag unless that exact tag exists in your Ollama environment. If you put deepseek-r1:8b into the official DeepSeek API, the API will not treat it as a valid hosted V4 model ID. Keep these paths separate.
For a broader comparison of hosted, local, reasoning, coding, and legacy DeepSeek models, use the DeepSeek Models Hub, the DeepSeek V4 guide, and the DeepSeek local vs API guide.
When to use Ollama, Open WebUI, LM Studio, or the official API
| Option | Best for | Why choose it |
|---|---|---|
| Ollama | Fast local setup, terminal use, scripts, and a simple local API | It is the most direct way to pull and run a local model with a few commands. |
| Open WebUI | Browser chat on top of Ollama | It gives you a ChatGPT-style local interface while connecting to Ollama or OpenAI-compatible providers. |
| LM Studio | Desktop GUI, model browsing, GGUF workflows, and local server mode | It is a good fit if you prefer an all-in-one app instead of terminal commands. |
| Official DeepSeek API | Hosted V4 models, high-quality API workflows, and large-context production usage | Use deepseek-v4-flash or deepseek-v4-pro when you want the hosted DeepSeek service rather than local inference. |
This guide uses Ollama as the base runtime because it is simple, scriptable, and easy to pair with Open WebUI. If you want the desktop-first path instead, read DeepSeek in LM Studio.
Prerequisites
- A supported computer:
- Windows 10 22H2 or newer
- macOS 14 Sonoma or newer
- A modern Linux system for the standard installer
- Enough free disk space for the model you plan to download. DeepSeek local models can range from about 1 GB to hundreds of GB depending on the tag.
- Enough RAM or GPU VRAM for the model you plan to run. Bigger tags require more memory and can run slowly on CPU-only systems.
- An internet connection for the initial Ollama and model downloads.
- Optional: Docker, if you want the easiest Open WebUI setup.
If you are unsure what your machine can handle, start smaller. The safest first test is usually deepseek-r1:8b on a modern machine, or deepseek-r1:7b / deepseek-r1:1.5b on limited hardware. You can scale up after confirming the install works.
Choose the right DeepSeek model size
Ollama lists several DeepSeek-R1 tags, including 1.5B, 7B, 8B, 14B, 32B, 70B, and 671B variants. DeepSeek’s official R1 repository also describes the full 671B / 37B-active DeepSeek-R1 model and the smaller distilled models based on Qwen and Llama families. For normal desktops and laptops, the distilled tags are the practical starting point.
The table below uses Ollama’s listed download sizes as practical orientation points. These are not hard guarantees: real performance depends on quantization, context length, runtime settings, CPU/GPU offload, drivers, temperature, output length, and how many other applications are running.
| Ollama tag | Listed download size | Practical starting hardware | Best use |
|---|---|---|---|
deepseek-r1:1.5b | About 1.1 GB | CPU-only testing, older laptops, or 8 GB+ system RAM | Install tests, lightweight prompts, very limited hardware |
deepseek-r1:7b | About 4.7 GB | 16 GB+ system RAM or entry-level GPU setups | Entry local chat and coding experiments |
deepseek-r1:8b | About 5.2 GB | 16 GB+ system RAM, Apple Silicon unified memory, or 8 GB VRAM class GPUs | Best default starting point for most users |
deepseek-r1:14b | About 9.0 GB | 24–32 GB+ system RAM or stronger GPU setups | Better quality if your machine has memory headroom |
deepseek-r1:32b | About 20 GB | 48–64 GB+ system RAM or high-VRAM workstation setups | Power users who want stronger local quality |
deepseek-r1:70b | About 43 GB | High-memory workstation or multi-GPU/offload setup | Advanced local experimentation |
deepseek-r1:671b | About 404 GB | Server-class or distributed infrastructure | Research/infrastructure teams, not typical local installs |
Simple model recommendation
- Use
deepseek-r1:8bfor the first install on a modern laptop or desktop. - Use
deepseek-r1:7bordeepseek-r1:1.5bif the machine is older or memory-constrained. - Use
deepseek-r1:14bordeepseek-r1:32bonly after confirming that smaller tags run smoothly. - Treat
deepseek-r1:70banddeepseek-r1:671bas specialist options, not beginner installs.
Important note about long context
Some DeepSeek and Ollama tags advertise very large context windows. That does not mean you should max out context length on a laptop. Large context consumes memory and slows inference. For normal local chat, coding, and testing, a smaller practical context is usually faster and more stable. Raise context only when you actually need to process long documents.
Step 1: Install Ollama
Windows
Install Ollama with the official Windows installer from Ollama. The Windows build is designed to install in your user account and does not normally require Administrator rights. After installing, open PowerShell or Windows Terminal and verify:
ollama --version
If you use an NVIDIA GPU, make sure your driver is current enough for Ollama’s Windows requirements. If you use an AMD Radeon GPU, install the current AMD driver package. If the command is not found, restart the terminal or relaunch Ollama from the Start menu.
macOS
Download the official macOS app from Ollama. Ollama’s macOS documentation requires macOS 14 Sonoma or newer. After installing the app, verify that the CLI is available:
ollama --version
On Apple Silicon Macs, Ollama can use Apple’s local acceleration path. Unified memory helps, but it is still important to choose a model size that leaves enough memory for macOS and other apps.
Linux
Install Ollama with the official Linux installer script:
curl -fsSL https://ollama.com/install.sh | sh
Then verify the installation and check the service:
ollama --version
sudo systemctl status ollama
If the service is installed but not running, start it:
sudo systemctl start ollama
Step 2: Download a DeepSeek model
For most first-time users, start with the 8B tag:
ollama pull deepseek-r1:8b
Other useful starting points:
# Very small test setup
ollama pull deepseek-r1:1.5b
# Entry-level local setup
ollama pull deepseek-r1:7b
# Recommended first install for many users
ollama pull deepseek-r1:8b
# Better quality if you have more memory
ollama pull deepseek-r1:14b
# Strong workstation option
ollama pull deepseek-r1:32b
Do not start with the largest tag just because it appears in the library. A smaller model that runs smoothly is usually more useful than a huge model that barely loads or responds too slowly to use.
Step 3: Run DeepSeek locally
Once the model finishes downloading, launch it:
ollama run deepseek-r1:8b
You can also send a one-shot prompt directly from the command line:
ollama run deepseek-r1:8b "Write a Python function that checks whether a number is prime."
Useful Ollama commands:
| Command | What it does |
|---|---|
ollama pull deepseek-r1:8b | Downloads the model tag |
ollama run deepseek-r1:8b | Starts an interactive local chat |
ollama ls | Lists downloaded models |
ollama ps | Shows currently loaded models |
ollama stop deepseek-r1:8b | Stops a running model |
ollama rm deepseek-r1:8b | Deletes the downloaded model |
ollama serve | Starts the Ollama server manually if needed |
Step 4: Manage DeepSeek reasoning output
DeepSeek-R1 is a reasoning-capable model family. Depending on the Ollama version, tag, and settings, you may see a separated reasoning trace, a visible thinking block, or only the final answer. Do not build production code that depends only on manually stripping <think>...</think> text. Modern Ollama thinking support can separate reasoning from the final response through fields such as message.thinking and message.content.
In the CLI, thinking behavior can be toggled for supported models:
# Enable thinking for a prompt
ollama run deepseek-r1:8b --think "Solve 19 * 27 step by step"
# Disable thinking for a prompt, if supported by the model/runtime
ollama run deepseek-r1:8b --think=false "Solve 19 * 27 step by step"
# Keep the benefits of a thinking-capable model while hiding visible thinking in CLI output
ollama run deepseek-r1:8b --hidethinking "Solve 19 * 27 step by step"
Inside an interactive Ollama session, you can also use:
/set think
/set nothink
For apps, prefer API-level handling. Use the think field where supported, display the final answer to users, and store or hide the reasoning trace according to your product and privacy requirements.
Step 5: Add a browser UI with Open WebUI
Ollama alone is enough if you are comfortable with the terminal. If you want a browser chat interface, install Open WebUI. Open WebUI’s documentation describes Docker as the recommended path for most users, while Python installs are also available for manual setups.
Option A: Docker, recommended for most users
If Ollama is running on your host machine and Open WebUI runs in Docker, use the host gateway mapping so the container can reach the Ollama server:
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Then open:
http://localhost:3000
The volume open-webui:/app/backend/data preserves your Open WebUI data between container restarts. Do not remove that volume unless you intentionally want to delete local Open WebUI data.
Option B: Python install
If Docker is not your preference, Open WebUI also documents Python-based installation:
pip install open-webui
open-webui serve
Then open:
http://localhost:8080
If you mainly want a graphical local workflow and model browser rather than an Ollama-centered setup, LM Studio is a valid alternative. See our DeepSeek in LM Studio guide.
Step 6: Use DeepSeek through the local Ollama API
When Ollama is running, the local API is available at:
http://localhost:11434
Simple chat request
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1:8b",
"messages": [
{
"role": "user",
"content": "Explain the difference between TCP and UDP in plain English."
}
],
"think": true,
"stream": false
}'
For supported thinking models, the response can separate thinking output from the final content. That is cleaner than relying only on visible <think> tags.
Python example with the Ollama SDK
from ollama import chat
response = chat(
model="deepseek-r1:8b",
messages=[
{
"role": "user",
"content": "Explain the difference between TCP and UDP in plain English."
}
],
think=True,
stream=False,
)
# Final answer
print(response.message.content)
# Optional reasoning trace, if returned by your model/runtime
if getattr(response.message, "thinking", None):
print("\n--- thinking trace ---")
print(response.message.thinking)
OpenAI-compatible local endpoint
Ollama also supports OpenAI-compatible endpoints under /v1. This can be useful when an existing tool already expects an OpenAI-style API shape.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama",
)
completion = client.chat.completions.create(
model="deepseek-r1:8b",
messages=[
{
"role": "user",
"content": "Give me a concise local DeepSeek setup checklist."
}
],
)
print(completion.choices[0].message.content)
For native DeepSeek hosted API calls, use our DeepSeek API guide. For local Ollama calls, use the Ollama model tag you pulled locally.
Privacy and security notes
Running DeepSeek locally can improve privacy because prompts and outputs are processed on your machine after the model has been downloaded. However, “local” does not automatically mean “risk-free.” Your privacy depends on your operating system, logs, chat UI, connected tools, network settings, and whether you expose local services to other devices.
- Keep local servers local. Do not expose Ollama’s
11434port or Open WebUI’s3000/8080port to the public internet. - Be careful with LAN access. If you intentionally allow access from other devices, secure your network and use authentication where available.
- Protect chat history. Open WebUI and other local interfaces may store chats, settings, uploads, or logs on disk.
- Download from trusted sources. Prefer official Ollama tags, the official
deepseek-aiorganization on Hugging Face, and known tooling sources. - Do not confuse local inference with official DeepSeek services. Local Ollama prompts are not sent to DeepSeek’s API unless you connect an external provider or tool that does so.
For broader site policy context, see our Security page and Privacy Policy.
Common issues and fixes
1) “ollama: command not found”
Close and reopen your terminal, then run ollama --version again. On macOS, make sure the Ollama app finished linking the CLI. On Windows, relaunch Ollama from the Start menu or reinstall with the official installer. On Linux, check that the installer completed successfully and that your shell PATH is updated.
2) The model runs, but it is painfully slow
You are probably running CPU-only inference, using a model that is too large, or using too much context. Drop to deepseek-r1:8b, deepseek-r1:7b, or deepseek-r1:1.5b. Keep prompts shorter while testing. If your goal is maximum answer quality rather than privacy/offline use, the hosted DeepSeek API or Chat-Deep.ai browser workflows may be a better fit.
3) You hit out-of-memory errors
Use a smaller tag, close memory-heavy apps, reduce context length, and avoid loading multiple local models at the same time. Large tags such as 70b and 671b are not realistic first installs for normal laptops.
4) GPU acceleration is not working
Check your drivers and platform support first. Windows users with NVIDIA GPUs should verify the NVIDIA driver. AMD users should install the correct Radeon/ROCm-compatible driver path for their platform. Apple Silicon users should still watch memory pressure even though Apple’s local acceleration path is available.
5) Open WebUI loads, but it cannot see Ollama
Confirm Ollama is running by opening a terminal and running:
ollama ls
If Open WebUI is inside Docker and Ollama is running on the host, the container may need to reach Ollama through host.docker.internal:11434. The Docker command above includes --add-host=host.docker.internal:host-gateway for that reason. In Open WebUI settings, check the Ollama connection URL if models are not appearing.
6) The API returns “model not found”
Run:
ollama ls
Then use the exact local tag shown in the list. If you pulled deepseek-r1:8b, use deepseek-r1:8b in API calls. Do not replace it with deepseek-v4-pro unless you are using a hosted API or a separate model source that explicitly supports that name.
7) The output shows thinking text and you only want the answer
For CLI usage, try --hidethinking or --think=false if supported by your model/runtime. For app development, handle separated fields such as message.thinking and message.content instead of relying only on regular expressions over visible <think> text.
8) You installed the biggest model and the machine freezes
This is usually a hardware mismatch, not a DeepSeek bug. Start with deepseek-r1:8b or smaller. Confirm basic operation, then move up only if your machine has enough memory and acceptable speed.
For more help, see our DeepSeek troubleshooting guide.
Practical setup recommendations
Best first setup for most people
Install Ollama, pull deepseek-r1:8b, test it in the terminal, then add Open WebUI if you want a browser interface.
Best setup for older laptops
Start with deepseek-r1:1.5b or deepseek-r1:7b. Accept that CPU-only inference may be slow, and avoid very large context windows.
Best setup for strong local hardware
Try deepseek-r1:14b first, then deepseek-r1:32b if you have enough memory. For many individual users, 32B is already a serious local setup.
Best setup if you need a desktop app
Use LM Studio instead of combining terminal commands with a separate browser UI.
Best setup if you need the current hosted DeepSeek model family
Use the official DeepSeek API with deepseek-v4-flash or deepseek-v4-pro. Local Ollama R1 tags are useful, but they are not the same product path as the hosted V4 API.
FAQ
Can I run DeepSeek locally without a GPU?
Yes, but keep expectations realistic. Start with deepseek-r1:1.5b or deepseek-r1:7b. CPU-only inference is usually much slower than GPU-backed inference, especially for longer answers and reasoning-heavy prompts.
Which DeepSeek model should I install first?
For most users, deepseek-r1:8b is the best first install. It is large enough to be useful but still much more realistic than 14B, 32B, 70B, or 671B on normal hardware.
Is local Ollama DeepSeek the same as the official DeepSeek API?
No. Ollama local usage normally runs local open-weight model tags such as deepseek-r1:8b. The current official hosted DeepSeek API uses deepseek-v4-flash and deepseek-v4-pro. These names and product paths are not interchangeable.
Do I need a DeepSeek API key for this local setup?
No. You do not need a DeepSeek API key to run a local Ollama model after it has been downloaded. You only need an API key if you are calling DeepSeek’s hosted API or another external provider.
Can I use a GUI instead of the terminal?
Yes. Use Open WebUI on top of Ollama for a browser UI, or use LM Studio if you want a desktop-first workflow with model browsing and a local server interface.
Is local DeepSeek private?
Local inference can keep prompts and outputs on your machine, which is a privacy advantage. However, privacy still depends on your chat UI, local logs, connected tools, browser settings, network exposure, and operating system security. Do not expose local services to the internet.
Should I try the 671B model at home?
Usually no. The listed download size alone makes it impractical for normal laptops and desktops. Use a smaller distilled tag unless you already operate server-class or distributed hardware.
Can I install DeepSeek V4 locally with Ollama?
This guide focuses on practical local Ollama usage with DeepSeek-R1-style tags. DeepSeek V4 is the current hosted API focus on Chat-Deep.ai, but V4-class open-weight models are much larger infrastructure targets and should not be treated as normal beginner local installs. If you need current V4 behavior, use the hosted API path with deepseek-v4-flash or deepseek-v4-pro.
What happened to deepseek-chat and deepseek-reasoner?
Those are legacy hosted API aliases, not local Ollama tags. According to DeepSeek’s current API documentation, they route to V4 Flash modes during the transition period and will be fully retired and inaccessible after July 24, 2026, 15:59 UTC.
Final recommendation
If your goal is to get DeepSeek running locally with the least friction, install Ollama, pull deepseek-r1:8b, and test it in the terminal. Add Open WebUI only after the model runs correctly. If your hardware struggles, move down to deepseek-r1:7b or deepseek-r1:1.5b. If you need the current official DeepSeek hosted API experience, use deepseek-v4-flash or deepseek-v4-pro instead of trying to force a local R1 tag into an API workflow.
Next useful pages: DeepSeek Models Hub, DeepSeek V4, DeepSeek R1, DeepSeek local vs API, DeepSeek in LM Studio, and DeepSeek API guide.




