Quick answer: The easiest way to learn how to run DeepSeek locally is to install Ollama, open a terminal, and run:
ollama run deepseek-r1:8b
For a low-end laptop, use deepseek-r1:1.5b. For a stronger machine, try deepseek-r1:14b or deepseek-r1:32b. Ollama provides DeepSeek-R1 variants including 1.5B, 7B, 8B, 14B, 32B, 70B, and 671B, and the first run downloads the model before starting local chat.
Table of Contents
Quick Answer: How to Run DeepSeek Locally
To run DeepSeek locally, install Ollama, then run one of these commands:
ollama run deepseek-r1:8b
For smaller machines:
ollama run deepseek-r1:1.5b
For better reasoning on machines with more RAM or VRAM:
ollama run deepseek-r1:14b
The first run downloads the model. After that, you can run DeepSeek offline as long as you do not need to download another model or update. You do not need a DeepSeek API key to run a local Ollama model, and Ollama’s local API does not require authentication when accessed through localhost:11434.
For most readers, the best starting point is:
ollama run deepseek-r1:8b
The 8b option is a practical balance between speed, memory use, and answer quality. Ollama’s DeepSeek-R1 page currently maps the 8B command to DeepSeek-R1-0528-Qwen3-8B, while also listing smaller and larger distilled variants.
What Does “Running DeepSeek Locally” Mean?
Running DeepSeek locally means the model runs on your own computer instead of sending every prompt to DeepSeek’s hosted chat or API service.
There are two very different ways to use DeepSeek:
| Option | Where inference happens | Needs internet during use? | Needs DeepSeek API key? | Best for |
|---|---|---|---|---|
| DeepSeek hosted chat/API | DeepSeek servers | Yes | Usually yes for API | Convenience, strongest hosted models |
| Local DeepSeek model | Your computer | No, after download | No for local Ollama/LM Studio models | Privacy, offline use, experimentation |
Local inference gives you more control. Your prompts can stay on your device, you avoid per-token API costs, and you can build a local DeepSeek API for your own tools. Ollama states that locally run models do not send prompts and answers back to Ollama’s servers, while cloud-hosted models are handled differently.
The tradeoff is hardware. A small DeepSeek R1 distilled model can run on a normal laptop. The full 671B DeepSeek-R1 or R1-0528 model is not a normal laptop setup. It requires advanced quantization, large storage, and serious RAM/VRAM planning.
Which DeepSeek Model Should You Run?
DeepSeek-R1 is the most practical DeepSeek family for local use today. The original DeepSeek-R1 and DeepSeek-R1-Zero are 671B-parameter mixture-of-experts models with 37B activated parameters and a 128K context length. DeepSeek also released six smaller distilled models based on Qwen and Llama, in 1.5B, 7B, 8B, 14B, 32B, and 70B sizes.
DeepSeek-R1-0528 is the important newer R1 update. The official Hugging Face model card describes it as a minor version upgrade with improved reasoning depth and inference capabilities, and it also notes that system prompts are now supported and users no longer need to force <think>\n at the beginning of the response.
| Model / variant | Best for | Approximate hardware class | Pros | Limitations |
|---|---|---|---|---|
| DeepSeek-R1 1.5B | Very low-end laptops, quick tests | 8GB RAM or better | Fastest, smallest, easiest to run | Weakest reasoning and coding quality |
| DeepSeek-R1 7B / 8B | Most beginners | 16GB RAM or Apple Silicon with enough unified memory | Best default balance; 8B is available as R1-0528-Qwen3-8B in Ollama | Still not equal to the full R1 model |
| DeepSeek-R1 14B | Better reasoning on consumer hardware | 24–32GB RAM or decent GPU | Better quality than 7B/8B | Slower; may need quantization |
| DeepSeek-R1 32B | Coding, math, stronger local reasoning | 32–64GB RAM or 12–24GB VRAM depending on quant | Noticeably stronger responses | Heavy for normal laptops |
| DeepSeek-R1 70B | Advanced local experiments | 64GB+ RAM or high-VRAM workstation | Much stronger than small distilled models | Slow and memory-hungry |
| Full DeepSeek-R1 / R1-0528 671B quantized | Researchers, workstation users, enthusiasts | Large RAM/VRAM, advanced GGUF/Ollama/llama.cpp workflows | Closest to full DeepSeek local experience | Not beginner-friendly; huge storage and memory needs |
| DeepSeek V4 Flash / Pro | Freshness, tracking newest open DeepSeek models | Advanced only; not a default consumer setup | Newer architecture, 1M context in official services | V4-Pro and V4-Flash are far larger than common R1 distilled models |
For most people searching “how to run DeepSeek locally,” the right choice is DeepSeek-R1 8B. Use 1.5b only if your computer is weak. Use 14b or 32b if you have enough memory and want better output.
Hardware Requirements for Running DeepSeek Locally
Exact DeepSeek hardware requirements depend on model size, quantization, context length, runtime, GPU acceleration, and whether you use CPU, NVIDIA CUDA, Apple Metal, or another backend.
Use this practical guide:
| Hardware | Recommended DeepSeek model | What to expect |
|---|---|---|
| 8GB RAM | DeepSeek-R1 1.5B | Works for basic tests; CPU inference may be slow |
| 16GB RAM | DeepSeek-R1 7B or 8B | Practical starting point for most users |
| 24–32GB RAM | DeepSeek-R1 8B or 14B | Better speed and stability |
| 32–64GB RAM or 12–24GB VRAM | DeepSeek-R1 14B or 32B | Stronger local reasoning if quantized properly |
| 64GB+ RAM or high VRAM | DeepSeek-R1 32B or 70B | Advanced local use; expect tuning |
| 128GB+ RAM / workstation | Large GGUF quantized models | Useful for experiments, not casual use |
| 180GB+ unified memory or combined RAM+VRAM | Advanced R1-0528 quant workflows | Needed for better performance with huge quantized R1-0528 setups |
Unsloth’s DeepSeek-R1-0528 local guide says the full model needs major preparation: the full model is listed as roughly 715GB in size, while a 1.66-bit dynamic quant is listed at 162GB. It also recommends at least 64GB RAM for that quant, and around 180GB unified memory or combined RAM+VRAM for better performance.
For beginners, ignore the full 671B model at first. Start with 8B.
Method 1 — Run DeepSeek Locally with Ollama
Ollama is the easiest way to install DeepSeek locally because it handles model download, local execution, and the local API server.
Ollama is available for macOS, Windows, and Linux. The official download page lists the Linux/macOS terminal installer, while the Windows page provides a PowerShell installer and says Windows 10 or later is required.
Step 1: Install Ollama
Linux
curl -fsSL https://ollama.com/install.sh | sh
macOS
Use the official macOS download, or install from the terminal if supported on your setup:
curl -fsSL https://ollama.com/install.sh | sh
Windows
Open PowerShell and run:
irm https://ollama.com/install.ps1 | iex
Then confirm the installation:
ollama --version
Step 2: Run DeepSeek-R1 8B
ollama run deepseek-r1:8b
The first run downloads the model. After the download finishes, Ollama opens an interactive chat in your terminal.
Step 3: Choose the right model size
Use these commands based on your hardware:
ollama run deepseek-r1:1.5b
ollama run deepseek-r1:7b
ollama run deepseek-r1:8b
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b
ollama run deepseek-r1:70b
The official Ollama DeepSeek-R1 page lists commands for the 8B R1-0528-Qwen3 model, the full 671B model, and distilled models including 1.5B, 7B, 14B, 32B, and 70B.
Step 4: List installed models
ollama list
Step 5: Stop a model
ollama stop deepseek-r1:8b
Ollama’s FAQ also notes that models are kept in memory for a period after use and can be unloaded with ollama stop.
Step 6: Update the model
If you installed an older DeepSeek-R1 build, update it:
ollama pull deepseek-r1
Ollama’s DeepSeek-R1 page specifically notes that older versions can be updated with ollama pull deepseek-r1.
Method 2 — Run DeepSeek Locally with LM Studio
LM Studio is the best option if you want to run DeepSeek locally with a graphical desktop app instead of the terminal.
It works well for users who want to:
- Search and download local models from a desktop interface.
- Chat with DeepSeek without memorizing commands.
- Load GGUF or MLX-style local model files.
- Start a local server for apps and scripts.
LM Studio’s official local server documentation says you can serve local LLMs from the Developer tab on localhost or on your network, and it supports REST APIs, client libraries, OpenAI-compatible endpoints, and Anthropic-compatible endpoints.
Basic LM Studio setup
- Install LM Studio for your operating system.
- Open the app.
- Search for a DeepSeek model, such as a DeepSeek-R1 distilled GGUF model.
- Choose a quantized model that fits your RAM or VRAM.
- Download the model.
- Open the Chat tab and load it.
- Start chatting locally.
Optional: enable the local server
In LM Studio:
- Open the Developer tab.
- Select your downloaded DeepSeek model.
- Toggle Start server.
- Use the provided local endpoint from your own apps.
LM Studio is ideal for non-terminal users. Ollama is still the simpler recommendation for this guide, but LM Studio is often the better user experience for people who prefer a desktop GUI.
Method 3 — Add a ChatGPT-like UI with Open WebUI
Ollama’s terminal interface is useful, but many people want a browser-based interface that feels closer to ChatGPT. That is where Open WebUI helps.
Open WebUI describes itself as a self-hosted AI platform designed to operate offline, with support for Ollama and OpenAI-compatible APIs.
Basic flow
- Install and run Ollama.
- Run a DeepSeek model:
ollama run deepseek-r1:8b
- Install Open WebUI.
- Connect Open WebUI to your Ollama server.
- Chat with DeepSeek from the browser.
Docker example
A common Docker setup looks like this:
docker run -d \
-p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Then open the Open WebUI interface in your browser and configure the Ollama connection.
Common connection issue
If Open WebUI runs in a container, localhost inside the container is not always the same as localhost on your host machine. Open WebUI’s quick start documentation says that, in the relevant container setup, the Ollama API connection can be set to:
http://host.containers.internal:11434
Depending on Docker, Podman, Windows, macOS, or Linux networking, you may need host.docker.internal, host.containers.internal, or a host IP address.
Method 4 — Run DeepSeek with llama.cpp and GGUF
Use llama.cpp if you want more manual control over GGUF models, quantization, CPU/GPU tuning, and advanced large-model experiments.
This method is best for:
- Advanced users.
- GGUF model files from Hugging Face.
- Manual quantized model downloads.
- Large DeepSeek-R1 or R1-0528 experiments.
- Local OpenAI-compatible endpoints through
llama-server.
The official llama.cpp README states that llama.cpp requires models to be stored in GGUF format, and that models in other formats can be converted using conversion scripts.
High-level llama.cpp workflow
- Install build tools.
- Clone llama.cpp.
- Build
llama-cliorllama-server. - Download a compatible DeepSeek GGUF model.
- Run the model locally.
- Optionally expose it through an OpenAI-compatible local endpoint.
Example build outline:
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
Example local run:
./build/bin/llama-cli \
-m /path/to/deepseek-model.gguf \
-p "Explain local LLM inference in simple terms."
Example server run:
./build/bin/llama-server \
-m /path/to/deepseek-model.gguf \
--host 127.0.0.1 \
--port 8080
The llama.cpp HTTP server supports GPU and CPU inference for F16 and quantized models, plus OpenAI-compatible chat completions, responses, and embeddings routes.
For most beginners, do not start here. Use Ollama first, then come back to llama.cpp when you need GGUF-level control.
How to Use DeepSeek Locally from an API
Ollama automatically exposes a local API after installation. Its official API documentation says the default base URL is:
http://localhost:11434/api
Start Ollama server
In many desktop installations, Ollama already runs in the background. If needed, start it manually:
ollama serve
Then make sure your model is available:
ollama run deepseek-r1:8b
Call DeepSeek locally with curl
curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1:8b",
"messages": [
{ "role": "user", "content": "Explain how local LLM inference works." }
],
"stream": false
}'
Ollama’s chat API endpoint is /api/chat, and the documentation shows the same message-based request structure.
Use DeepSeek locally from Python
Install the Ollama Python package:
pip install ollama
Then run:
import ollama
response = ollama.chat(
model="deepseek-r1:8b",
messages=[
{"role": "user", "content": "Write a short Python function to reverse a string."}
],
)
print(response["message"]["content"])
This gives you a local DeepSeek API workflow without using DeepSeek’s hosted API.
Can You Run DeepSeek V4 Locally?
DeepSeek V4 matters for freshness, but it is not the default recommendation for most local users.
DeepSeek announced DeepSeek-V4 Preview on April 24, 2026. The official release says DeepSeek-V4-Pro has 1.6T total parameters with 49B active parameters, while DeepSeek-V4-Flash has 284B total parameters with 13B active parameters. It also says both official V4 services support 1M context and Thinking / Non-Thinking modes.
That is exciting, but it does not mean V4 is easy to run on a normal laptop.
For most people who want to install DeepSeek locally, these remain the practical choices:
deepseek-r1:1.5bfor weak hardware.deepseek-r1:8bfor most users.deepseek-r1:14bordeepseek-r1:32bfor stronger machines.- GGUF/llama.cpp workflows for advanced users.
- Full R1/R1-0528 quantized workflows only for large-memory setups.
Advanced users can track Hugging Face, GGUF conversions, vLLM, SGLang, llama.cpp, Ollama, and LM Studio support for DeepSeek V4. Beginners should not assume that DeepSeek V4 Pro or Flash will behave like an 8B model on a consumer laptop.
Recommended Settings for Better Output
DeepSeek reasoning models can be sensitive to sampling settings.
For DeepSeek-R1-0528, the official Hugging Face model card says benchmark sampling used temperature 0.6 and top-p 0.95. It also says system prompts are supported in R1-0528 and users no longer need to force the model into thinking mode by starting the output with <think>\n.
Unsloth’s R1-0528 guide repeats the practical recommendation: temperature 0.6, top_p 0.95, and multiple tests for reliable evaluation.
Use these baseline settings:
| Setting | Suggested value | Why |
|---|---|---|
| temperature | 0.6 | Good default for reasoning-style R1 models |
| top_p | 0.95 | Keeps output varied without becoming too loose |
| context length | Lower if memory is tight | Longer context uses more memory |
| model size | Smaller if slow | 1.5B/8B are faster than 14B/32B |
| GPU acceleration | Enable if available | Improves speed significantly |
| system prompt | Supported in R1-0528 | Older R1 guidance differed |
For coding and math, ask for structured reasoning and a final answer. For example:
Solve the problem step by step. Then give the final answer in a short summary.
For privacy-sensitive work, use local inference and avoid sending sensitive files to hosted models.
Troubleshooting DeepSeek Local Setup
“ollama: command not found”
Ollama is not installed correctly, or your terminal cannot find it.
Fix:
ollama --version
If that fails, reinstall Ollama and reopen your terminal.
Model downloads are slow
DeepSeek models can be large. Try a smaller model first:
ollama run deepseek-r1:1.5b
On WSL2, Ollama’s FAQ mentions that Windows 10 networking settings can affect installation and model downloads.
Out of memory
Use a smaller model:
ollama run deepseek-r1:1.5b
or:
ollama run deepseek-r1:8b
Also reduce context length if your runtime exposes that option.
DeepSeek runs too slowly
Use a smaller model, enable GPU acceleration where available, close other memory-heavy apps, or switch from CPU inference to a GPU-enabled runtime.
For Docker users with NVIDIA GPUs, Ollama’s Docker documentation shows a GPU-enabled container command using --gpus=all.
Open WebUI cannot connect to Ollama
Check whether Open WebUI is running on the host or inside a container. Try one of these connection URLs:
http://localhost:11434
http://host.docker.internal:11434
http://host.containers.internal:11434
Open WebUI’s docs specifically mention host.containers.internal:11434 for the relevant container connection configuration.
GPU is not being used
Update GPU drivers, install CUDA or the correct backend, and check whether your runtime supports your GPU. On Docker, verify NVIDIA Container Toolkit or ROCm support depending on your GPU.
Windows firewall or network issue
If an app cannot reach Ollama, allow the app through Windows Firewall or keep all requests on localhost.
Model gives poor answers
Try:
- A larger model.
- Temperature around
0.6. - Clearer prompts.
- R1-0528 if available.
- Multiple attempts for complex reasoning.
Confusion between distilled model and full DeepSeek-R1
deepseek-r1:8b is not the full 671B DeepSeek-R1 model. It is a smaller distilled model designed to be practical on consumer hardware. DeepSeek’s GitHub explains that the smaller 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints are distilled from reasoning data generated by DeepSeek-R1 and based on Qwen/Llama models.
Need to update Ollama, LM Studio, or llama.cpp
Local AI tools change quickly. Update your runtime if commands fail, models do not load, or templates behave incorrectly.
Ollama vs LM Studio vs Open WebUI vs llama.cpp
| Tool | Best for | Difficulty | GUI? | API support? | Recommended user |
|---|---|---|---|---|---|
| Ollama | Fastest local DeepSeek setup | Easy | Minimal terminal UI | Yes, local API | Most beginners and developers |
| LM Studio | Desktop model browsing and chat | Easy | Yes | Yes | GUI-first users |
| Open WebUI | Browser-based ChatGPT-like interface | Medium | Yes, web UI | Connects to Ollama/OpenAI-compatible APIs | Users who want a polished local chat UI |
| llama.cpp | GGUF, quantization, manual tuning | Advanced | Basic server UI | Yes through llama-server | Power users and researchers |
The simplest stack is Ollama only. The best local chat experience is Ollama + Open WebUI. The best desktop GUI is LM Studio. The best advanced GGUF workflow is llama.cpp.
Best Setup for Most Users
Use this decision guide:
| User type | Recommended setup |
|---|---|
| Beginner | Ollama + deepseek-r1:8b |
| Low-end laptop | Ollama + deepseek-r1:1.5b |
| GUI user | LM Studio + a DeepSeek GGUF model |
| Browser UI user | Ollama + Open WebUI |
| Developer | Ollama + local API on localhost:11434 |
| Advanced GGUF user | llama.cpp + quantized DeepSeek GGUF |
| Large-model experimenter | Full R1/R1-0528 quantized GGUF with serious RAM/VRAM |
| Production/high throughput | vLLM or SGLang only if compatible with the chosen model and hardware |
For most readers, the best answer to “how to run DeepSeek locally” is:
ollama run deepseek-r1:8b
Then add Open WebUI if you want a browser interface, or LM Studio if you want a desktop GUI.
FAQ
Can I run DeepSeek locally for free?
Yes. You can run open DeepSeek models locally for free after downloading them. You still need your own hardware, storage, electricity, and time.
Do I need a DeepSeek API key to run it locally?
No. You do not need a DeepSeek API key to run local Ollama or LM Studio models. Ollama’s local API on localhost:11434 does not require authentication.
What is the easiest way to run DeepSeek locally?
Install Ollama and run:ollama run deepseek-r1:8b
Can I run DeepSeek locally on Windows?
Yes. Install Ollama on Windows, open PowerShell or CMD, and run a DeepSeek command. Ollama’s Windows download page lists a PowerShell installer and says Windows 10 or later is required.
Can I run DeepSeek locally on a Mac?
Yes. Apple Silicon Macs are good for local LLMs because they use unified memory. Start with deepseek-r1:8b or deepseek-r1:14b, depending on your memory.
Can I run DeepSeek locally without a GPU?
Yes. CPU inference works, especially for smaller models, but it is slower. Start with deepseek-r1:1.5b or deepseek-r1:8b.
Which DeepSeek model should I choose?
Most users should choose deepseek-r1:8b. Use 1.5b for weak hardware, 14b for stronger machines, and 32b or larger only if you understand memory requirements.
Is DeepSeek-R1 8B the same as the full DeepSeek-R1?
No. The 8B model is a smaller distilled model. The full DeepSeek-R1 is a 671B-parameter model with 37B activated parameters.
Can I run DeepSeek V4 locally?
Possibly for advanced users, but it is not the default beginner path. DeepSeek-V4-Pro and V4-Flash are much larger than common R1 distilled models, so most users should start with DeepSeek-R1 8B or R1-0528-Qwen3-8B.
Is local DeepSeek private?
Local inference is more private than hosted inference because prompts can stay on your device. Ollama says it does not see prompts or data when models are run locally.
How much RAM do I need?
Use 8GB RAM for 1.5B, 16GB for 7B/8B, 24–32GB for 8B/14B, and 32–64GB or more for 32B-class models. Very large quantized R1/R1-0528 setups need much more.
Can I use DeepSeek locally through Python?
Yes. Install the Ollama Python package and call ollama.chat() with model="deepseek-r1:8b".
How do I make DeepSeek run faster?
Use a smaller model, enable GPU acceleration, reduce context length, close other apps, and keep the model loaded when making repeated API calls.
Can I run DeepSeek offline?
Yes, after the model is downloaded. You need internet for the first download, updates, and new model pulls.
Conclusion
The best way to run DeepSeek locally in 2026 is to start with Ollama:
ollama run deepseek-r1:8b
That gives you a practical local DeepSeek setup with no DeepSeek API key, a terminal chat interface, and a local API endpoint. Use deepseek-r1:1.5b for weak laptops, deepseek-r1:14b or 32b for stronger machines, LM Studio for a desktop GUI, Open WebUI for a browser-based chat interface, and llama.cpp only when you need advanced GGUF control.
For almost everyone, the winning setup is:
Ollama + deepseek-r1:8b
Add Open WebUI when you want a ChatGPT-like local interface.
