You can run DeepSeek locally on Windows, Mac, or Linux with Ollama. For most people, the easiest starting point is to install Ollama, pull deepseek-r1:8b, and run it from the terminal. If you want a browser-based interface, add Open WebUI on top. If you prefer an all-in-one desktop app instead of the command line, see our DeepSeek in LM Studio guide.
This guide is part of the broader Chat-Deep.ai DeepSeek guide hub, where we separate official DeepSeek products from independent local-use walkthroughs. Here, we focus specifically on local deployment with Ollama.
Last verified: April 13, 2026.
What this guide covers
This walkthrough is for people who want to run DeepSeek on their own machine for privacy, offline use after download, local development, or self-hosted experimentation. It covers:
- How to choose a realistic DeepSeek model size for your hardware
- How to install Ollama on Windows, macOS, and Linux
- How to download and run DeepSeek locally
- How to add a browser UI with Open WebUI
- How to call your local DeepSeek model through Ollama’s API
- How to troubleshoot the most common local setup problems
Before you start: understand the naming
A lot of DeepSeek content online mixes together official API model names and local open-weight checkpoints. They are not the same thing.
In the official DeepSeek API, deepseek-chat and deepseek-reasoner are platform aliases. On the local side, most single-machine Ollama users will actually run open-weight DeepSeek-R1 family checkpoints such as deepseek-r1:1.5b, 7b, 8b, 14b, 32b, or 70b.
If you want a broader comparison of current DeepSeek naming, modes, and model families, start with the DeepSeek Models Hub.
When to use Ollama, Open WebUI, or LM Studio
| Tool | Best for | Why you would choose it |
|---|---|---|
| Ollama | Fast local setup, terminal use, scripts, and a simple local API | It is the fastest path to getting a model running with copy-and-paste commands. |
| Open WebUI | Browser chat on top of Ollama | It gives you a ChatGPT-style interface while keeping the model local. |
| LM Studio | Users who want a desktop GUI and easier GGUF workflows | It is a good alternative if you would rather browse, download, and run models from an app. |
For this guide, we use Ollama as the base runtime because it is simple, scriptable, and easy to pair with Open WebUI. If you want the desktop-first path instead, use our LM Studio setup guide.
Prerequisites
- A supported computer:
- Windows 10 or later
- macOS 14 Sonoma or later
- A modern Linux system for the standard installer
- Enough free disk space for the model you plan to download
- Enough RAM or GPU VRAM for the model you plan to run
- An internet connection for the initial downloads
If you are unsure what your hardware can realistically handle, use the DeepSeek Hardware Chooser before you download a large model.
Choose the right DeepSeek model size
These are practical starting points for local inference, not guarantees. Real fit depends on quantization, context length, runtime overhead, concurrency, and whether you run on a discrete GPU, unified-memory Mac, or CPU-only system.
| Ollama tag | Official download size | Practical starting hardware | Who it is for |
|---|---|---|---|
deepseek-r1:1.5b | 1.1 GB | CPU-only or 4 GB VRAM, 8 GB+ system RAM | Testing, lightweight prompts, older laptops |
deepseek-r1:7b | 4.7 GB | 6 to 8 GB VRAM, or 16 GB+ system RAM | Entry-level local chat and coding |
deepseek-r1:8b | 5.2 GB | 8 GB VRAM, or 16 GB+ system RAM | Best default starting point for most users |
deepseek-r1:14b | 9.0 GB | 12 to 16 GB VRAM, or 24 to 32 GB+ system RAM | Better quality if you have midrange hardware |
deepseek-r1:32b | 20 GB | 24 GB VRAM, or 48 to 64 GB+ system RAM | Power users who want stronger local quality |
deepseek-r1:70b | 43 GB | 48 to 80 GB VRAM or a heavy CPU/offload setup | Workstations and serious local setups |
deepseek-r1:671b | 404 GB | Server-class multi-GPU or distributed setup only | Research and infrastructure teams, not typical desktops |
Simple recommendation
- Start with
8bif you have a modern laptop or desktop and want the easiest good-enough setup. - Start with
7bor1.5bif your hardware is limited. - Use
14bor32bonly if you already know your machine has the memory headroom. - Treat
70band especially671bas specialist options, not normal first installs.
Important note about 671B
The existence of a deepseek-r1:671b tag in Ollama does not mean it is a realistic local target for a normal single PC. In practice, most people looking for “run DeepSeek locally” should stay in the 7B to 32B range, or consider quantized GGUF workflows instead. If that is your situation, our DeepSeek quantization guide is the next page to read.
Step 1: Install Ollama
Windows
You can install Ollama on Windows with the official installer or PowerShell.
irm https://ollama.com/install.ps1 | iex
After installation, open PowerShell and verify:
ollama --version
macOS
Download the official macOS installer from Ollama, then verify in Terminal:
ollama --version
On Apple Silicon, Ollama uses Metal automatically. Unified memory helps, but it is still smart to choose a model that leaves memory headroom for the OS and other apps.
Linux
Install Ollama with the standard script:
curl -fsSL https://ollama.com/install.sh | sh
Then verify that the service is installed and running:
ollama --version
sudo systemctl start ollama
sudo systemctl status ollama
If you use NVIDIA, AMD, or Apple Silicon
- NVIDIA: Ollama can use supported NVIDIA GPU acceleration when drivers are installed correctly.
- AMD: Ollama documents AMD support through ROCm and additional Vulkan-based support, depending on platform.
- Apple Silicon: Metal support is built into Ollama on Apple Silicon Macs.
If you are on AMD ROCm or Apple Metal and want a deeper local GPU-specific walkthrough, read our ROCm and Mac Metal guide.
Step 2: Download a DeepSeek model
For most first-time users, this is the right starting command:
ollama pull deepseek-r1:8b
Other common starting points:
# Very small / test setup
ollama pull deepseek-r1:1.5b
# Good entry-level local setup
ollama pull deepseek-r1:7b
# Default recommendation for many users
ollama pull deepseek-r1:8b
# Better quality if you have more memory
ollama pull deepseek-r1:14b
# Strong single-workstation option
ollama pull deepseek-r1:32b
If you are not sure whether to choose 8B, 14B, or 32B, the safest answer is to start smaller, confirm that everything works, and only then move up.
Step 3: Run DeepSeek locally
Once the model finishes downloading, launch it:
ollama run deepseek-r1:8b
You can also send a one-shot prompt directly from the command line:
ollama run deepseek-r1:8b "Write a Python function that checks whether a number is prime."
What about visible reasoning or “thinking” output?
Depending on your Ollama version, model, and settings, you may see a visible reasoning trace or only the final answer. Do not hard-code your app around older <think>...</think> formatting assumptions. Newer Ollama releases separate thinking more cleanly and also let you hide it when needed.
Examples:
# Enable thinking for a prompt
ollama run deepseek-r1:8b --think "Solve 19 * 27 step by step"
# Disable thinking and return only the answer directly
ollama run deepseek-r1:8b --think=false "Solve 19 * 27 step by step"
# Hide visible thinking while still using a thinking-capable model
ollama run deepseek-r1:8b --hidethinking "Solve 19 * 27 step by step"
In an interactive session, you can also toggle thinking behavior with session commands such as /set think and /set nothink.
Step 4: Add a browser UI with Open WebUI
Ollama alone is enough if you are comfortable with the terminal. If you want a clean browser interface, install Open WebUI on top of it.
Option A: Docker
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000.
Option B: pip
pip install open-webui
open-webui serve
Then open http://localhost:8080.
If you mainly want a graphical local workflow and model browser, LM Studio is still a valid alternative. Use whichever path fits the way you actually work.
Step 5: Use DeepSeek through the local Ollama API
Once Ollama is running, your local API is available at http://localhost:11434.
Simple chat request
curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1:8b",
"messages": [
{
"role": "user",
"content": "Explain the difference between TCP and UDP in plain English."
}
]
}'
OpenAI-compatible local endpoint
If you already use tools built around OpenAI-style chat calls, Ollama also supports OpenAI-compatible endpoints on /v1. That makes it a convenient bridge for local development and testing. If your next step is turning a local model into a more structured service, read our DeepSeek API serving guide.
Useful Ollama commands
| Command | What it does |
|---|---|
ollama pull deepseek-r1:8b | Downloads a model |
ollama run deepseek-r1:8b | Starts a chat session with the model |
ollama ls | Lists downloaded models |
ollama ps | Shows models currently loaded in memory |
ollama stop deepseek-r1:8b | Stops a running model |
ollama rm deepseek-r1:8b | Deletes a downloaded model |
ollama serve | Starts the Ollama server |
Common issues and fixes
1) “ollama: command not found”
Your installation may not be on your PATH yet, or your shell needs to be reopened. Close the terminal, open a new one, and run ollama --version again. On macOS, make sure the app finished linking the CLI. On Windows, rerun the installer or use the official PowerShell installer.
2) The model runs, but it is painfully slow
You are probably on CPU-only inference or running a model that is too large for your machine. Move down to 8b, 7b, or 1.5b. If your real goal is quality over privacy or offline use, the official API or browser workflows may be a better fit than forcing an oversized local model.
3) GPU acceleration is not kicking in
First, confirm your drivers and platform support. If you are on NVIDIA, verify your GPU is visible to the system. If you are on AMD, review the ROCm or Vulkan support path. If you are on Apple Silicon, Metal is automatic but memory pressure can still force you into a slower experience with oversized models.
4) You hit out-of-memory errors
Drop to a smaller tag, reduce the number of simultaneously running apps, and keep your prompts shorter while testing. For lower-memory systems, a quantized GGUF route may be more practical than a larger Ollama checkpoint. See our quantization guide or our llama.cpp CPU and Mac guide.
5) Open WebUI loads, but you cannot see your model
Make sure Ollama is actually running on the same machine and that your model has already been pulled locally. Confirm that the Ollama service is reachable before debugging the UI layer.
6) You want to access Ollama from another device on your network
By default, Ollama binds to localhost. If you plan to expose it beyond your machine, configure that intentionally and treat it as a security decision, not a convenience toggle.
Practical setup recommendations
Best first setup for most people
Install Ollama, pull deepseek-r1:8b, test in the terminal, then add Open WebUI only if you want a browser chat interface.
Best setup for older laptops or CPU-heavy systems
Start with 1.5b or 7b. If speed still feels poor, consider a smaller quantized GGUF workflow rather than forcing a larger model.
Best setup for a strong single-GPU workstation
Try 14b first, then 32b if you have the memory headroom. For a lot of individual users, 32B is the point where local quality becomes compelling without crossing into server-class complexity.
Best setup if you need an app-like desktop experience
Use LM Studio instead of stitching together terminal plus browser UI. It is a better fit for people who want to browse models visually and stay inside a desktop app.
FAQ
Can I run DeepSeek locally without a GPU?
Yes, but you should keep expectations realistic. Start with 1.5b or 7b. CPU-only inference is usually much slower than GPU-backed inference.
Which DeepSeek model should I start with?
For most people, deepseek-r1:8b is the best first install. It is a safer starting point than jumping directly to 14B, 32B, or larger.
Is the local Ollama model the same as the official DeepSeek API?
No. This guide is about local open-weight checkpoints. Official API aliases and local checkpoints are related to the same ecosystem, but they are not the same product path.
Can I use a GUI instead of the terminal?
Yes. Use Open WebUI on top of Ollama for a browser UI, or use LM Studio if you want a desktop-first workflow.
Is local DeepSeek private?
Local inference can keep prompts and outputs on your own machine, which is a major privacy advantage over hosted services. But local privacy still depends on your OS, browser, logs, network exposure, connected tools, and how you configured the system.
Should I try the 671B model at home?
Usually no. Its listed download size alone is enough to rule it out for most normal local setups. If you are not already running server-class hardware, that is not the right starting point.
Final recommendation
If your goal is to get DeepSeek running locally with the least friction, use Ollama plus deepseek-r1:8b, confirm performance on your own machine, and only then scale up. That path is faster, safer, and more realistic than chasing the biggest model name first.





