How to Run DeepSeek Locally: Complete 2026 Guide

Quick answer: The easiest way to learn how to run DeepSeek locally is to install Ollama, open a terminal, and run:

ollama run deepseek-r1:8b

For a low-end laptop, use deepseek-r1:1.5b. For a stronger machine, try deepseek-r1:14b or deepseek-r1:32b. Ollama provides DeepSeek-R1 variants including 1.5B, 7B, 8B, 14B, 32B, 70B, and 671B, and the first run downloads the model before starting local chat.

Quick Answer: How to Run DeepSeek Locally

To run DeepSeek locally, install Ollama, then run one of these commands:

ollama run deepseek-r1:8b

For smaller machines:

ollama run deepseek-r1:1.5b

For better reasoning on machines with more RAM or VRAM:

ollama run deepseek-r1:14b

The first run downloads the model. After that, you can run DeepSeek offline as long as you do not need to download another model or update. You do not need a DeepSeek API key to run a local Ollama model, and Ollama’s local API does not require authentication when accessed through localhost:11434.

For most readers, the best starting point is:

ollama run deepseek-r1:8b

The 8b option is a practical balance between speed, memory use, and answer quality. Ollama’s DeepSeek-R1 page currently maps the 8B command to DeepSeek-R1-0528-Qwen3-8B, while also listing smaller and larger distilled variants.

What Does “Running DeepSeek Locally” Mean?

Running DeepSeek locally means the model runs on your own computer instead of sending every prompt to DeepSeek’s hosted chat or API service.

There are two very different ways to use DeepSeek:

Option	Where inference happens	Needs internet during use?	Needs DeepSeek API key?	Best for
DeepSeek hosted chat/API	DeepSeek servers	Yes	Usually yes for API	Convenience, strongest hosted models
Local DeepSeek model	Your computer	No, after download	No for local Ollama/LM Studio models	Privacy, offline use, experimentation

Local inference gives you more control. Your prompts can stay on your device, you avoid per-token API costs, and you can build a local DeepSeek API for your own tools. Ollama states that locally run models do not send prompts and answers back to Ollama’s servers, while cloud-hosted models are handled differently.

The tradeoff is hardware. A small DeepSeek R1 distilled model can run on a normal laptop. The full 671B DeepSeek-R1 or R1-0528 model is not a normal laptop setup. It requires advanced quantization, large storage, and serious RAM/VRAM planning.

Which DeepSeek Model Should You Run?

DeepSeek-R1 is the most practical DeepSeek family for local use today. The original DeepSeek-R1 and DeepSeek-R1-Zero are 671B-parameter mixture-of-experts models with 37B activated parameters and a 128K context length. DeepSeek also released six smaller distilled models based on Qwen and Llama, in 1.5B, 7B, 8B, 14B, 32B, and 70B sizes.

DeepSeek-R1-0528 is the important newer R1 update. The official Hugging Face model card describes it as a minor version upgrade with improved reasoning depth and inference capabilities, and it also notes that system prompts are now supported and users no longer need to force <think>\n at the beginning of the response.

Model / variant	Best for	Approximate hardware class	Pros	Limitations
DeepSeek-R1 1.5B	Very low-end laptops, quick tests	8GB RAM or better	Fastest, smallest, easiest to run	Weakest reasoning and coding quality
DeepSeek-R1 7B / 8B	Most beginners	16GB RAM or Apple Silicon with enough unified memory	Best default balance; 8B is available as R1-0528-Qwen3-8B in Ollama	Still not equal to the full R1 model
DeepSeek-R1 14B	Better reasoning on consumer hardware	24–32GB RAM or decent GPU	Better quality than 7B/8B	Slower; may need quantization
DeepSeek-R1 32B	Coding, math, stronger local reasoning	32–64GB RAM or 12–24GB VRAM depending on quant	Noticeably stronger responses	Heavy for normal laptops
DeepSeek-R1 70B	Advanced local experiments	64GB+ RAM or high-VRAM workstation	Much stronger than small distilled models	Slow and memory-hungry
Full DeepSeek-R1 / R1-0528 671B quantized	Researchers, workstation users, enthusiasts	Large RAM/VRAM, advanced GGUF/Ollama/llama.cpp workflows	Closest to full DeepSeek local experience	Not beginner-friendly; huge storage and memory needs
DeepSeek V4 Flash / Pro	Freshness, tracking newest open DeepSeek models	Advanced only; not a default consumer setup	Newer architecture, 1M context in official services	V4-Pro and V4-Flash are far larger than common R1 distilled models

For most people searching “how to run DeepSeek locally,” the right choice is DeepSeek-R1 8B. Use 1.5b only if your computer is weak. Use 14b or 32b if you have enough memory and want better output.

Hardware Requirements for Running DeepSeek Locally

Exact DeepSeek hardware requirements depend on model size, quantization, context length, runtime, GPU acceleration, and whether you use CPU, NVIDIA CUDA, Apple Metal, or another backend.

Use this practical guide:

Hardware	Recommended DeepSeek model	What to expect
8GB RAM	DeepSeek-R1 1.5B	Works for basic tests; CPU inference may be slow
16GB RAM	DeepSeek-R1 7B or 8B	Practical starting point for most users
24–32GB RAM	DeepSeek-R1 8B or 14B	Better speed and stability
32–64GB RAM or 12–24GB VRAM	DeepSeek-R1 14B or 32B	Stronger local reasoning if quantized properly
64GB+ RAM or high VRAM	DeepSeek-R1 32B or 70B	Advanced local use; expect tuning
128GB+ RAM / workstation	Large GGUF quantized models	Useful for experiments, not casual use
180GB+ unified memory or combined RAM+VRAM	Advanced R1-0528 quant workflows	Needed for better performance with huge quantized R1-0528 setups

Unsloth’s DeepSeek-R1-0528 local guide says the full model needs major preparation: the full model is listed as roughly 715GB in size, while a 1.66-bit dynamic quant is listed at 162GB. It also recommends at least 64GB RAM for that quant, and around 180GB unified memory or combined RAM+VRAM for better performance.

For beginners, ignore the full 671B model at first. Start with 8B.

Method 1 — Run DeepSeek Locally with Ollama

Ollama is the easiest way to install DeepSeek locally because it handles model download, local execution, and the local API server.

Ollama is available for macOS, Windows, and Linux. The official download page lists the Linux/macOS terminal installer, while the Windows page provides a PowerShell installer and says Windows 10 or later is required.

Step 1: Install Ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

macOS

Use the official macOS download, or install from the terminal if supported on your setup:

curl -fsSL https://ollama.com/install.sh | sh

Windows

Open PowerShell and run:

irm https://ollama.com/install.ps1 | iex

Then confirm the installation:

ollama --version

Step 2: Run DeepSeek-R1 8B

ollama run deepseek-r1:8b

The first run downloads the model. After the download finishes, Ollama opens an interactive chat in your terminal.

Step 3: Choose the right model size

Use these commands based on your hardware:

ollama run deepseek-r1:1.5b

ollama run deepseek-r1:7b

ollama run deepseek-r1:8b

ollama run deepseek-r1:14b

ollama run deepseek-r1:32b

ollama run deepseek-r1:70b

The official Ollama DeepSeek-R1 page lists commands for the 8B R1-0528-Qwen3 model, the full 671B model, and distilled models including 1.5B, 7B, 14B, 32B, and 70B.

Step 4: List installed models

ollama list

Step 5: Stop a model

ollama stop deepseek-r1:8b

Ollama’s FAQ also notes that models are kept in memory for a period after use and can be unloaded with ollama stop.

Step 6: Update the model

If you installed an older DeepSeek-R1 build, update it:

ollama pull deepseek-r1

Ollama’s DeepSeek-R1 page specifically notes that older versions can be updated with ollama pull deepseek-r1.

Method 2 — Run DeepSeek Locally with LM Studio

LM Studio is the best option if you want to run DeepSeek locally with a graphical desktop app instead of the terminal.

It works well for users who want to:

Search and download local models from a desktop interface.
Chat with DeepSeek without memorizing commands.
Load GGUF or MLX-style local model files.
Start a local server for apps and scripts.

LM Studio’s official local server documentation says you can serve local LLMs from the Developer tab on localhost or on your network, and it supports REST APIs, client libraries, OpenAI-compatible endpoints, and Anthropic-compatible endpoints.

Basic LM Studio setup

Install LM Studio for your operating system.
Open the app.
Search for a DeepSeek model, such as a DeepSeek-R1 distilled GGUF model.
Choose a quantized model that fits your RAM or VRAM.
Download the model.
Open the Chat tab and load it.
Start chatting locally.

Optional: enable the local server

In LM Studio:

Open the Developer tab.
Select your downloaded DeepSeek model.
Toggle Start server.
Use the provided local endpoint from your own apps.

LM Studio is ideal for non-terminal users. Ollama is still the simpler recommendation for this guide, but LM Studio is often the better user experience for people who prefer a desktop GUI.

Method 3 — Add a ChatGPT-like UI with Open WebUI

Ollama’s terminal interface is useful, but many people want a browser-based interface that feels closer to ChatGPT. That is where Open WebUI helps.

Open WebUI describes itself as a self-hosted AI platform designed to operate offline, with support for Ollama and OpenAI-compatible APIs.

Basic flow

Install and run Ollama.
Run a DeepSeek model:

ollama run deepseek-r1:8b

Install Open WebUI.
Connect Open WebUI to your Ollama server.
Chat with DeepSeek from the browser.

Docker example

A common Docker setup looks like this:

docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Then open the Open WebUI interface in your browser and configure the Ollama connection.

Common connection issue

If Open WebUI runs in a container, localhost inside the container is not always the same as localhost on your host machine. Open WebUI’s quick start documentation says that, in the relevant container setup, the Ollama API connection can be set to:

http://host.containers.internal:11434

Depending on Docker, Podman, Windows, macOS, or Linux networking, you may need host.docker.internal, host.containers.internal, or a host IP address.

Method 4 — Run DeepSeek with llama.cpp and GGUF

Use llama.cpp if you want more manual control over GGUF models, quantization, CPU/GPU tuning, and advanced large-model experiments.

This method is best for:

Advanced users.
GGUF model files from Hugging Face.
Manual quantized model downloads.
Large DeepSeek-R1 or R1-0528 experiments.
Local OpenAI-compatible endpoints through llama-server.

The official llama.cpp README states that llama.cpp requires models to be stored in GGUF format, and that models in other formats can be converted using conversion scripts.

High-level llama.cpp workflow

Install build tools.
Clone llama.cpp.
Build llama-cli or llama-server.
Download a compatible DeepSeek GGUF model.
Run the model locally.
Optionally expose it through an OpenAI-compatible local endpoint.

Example build outline:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release

Example local run:

./build/bin/llama-cli \
  -m /path/to/deepseek-model.gguf \
  -p "Explain local LLM inference in simple terms."

Example server run:

./build/bin/llama-server \
  -m /path/to/deepseek-model.gguf \
  --host 127.0.0.1 \
  --port 8080

The llama.cpp HTTP server supports GPU and CPU inference for F16 and quantized models, plus OpenAI-compatible chat completions, responses, and embeddings routes.

For most beginners, do not start here. Use Ollama first, then come back to llama.cpp when you need GGUF-level control.

How to Use DeepSeek Locally from an API

Ollama automatically exposes a local API after installation. Its official API documentation says the default base URL is:

http://localhost:11434/api

Start Ollama server

In many desktop installations, Ollama already runs in the background. If needed, start it manually:

ollama serve

Then make sure your model is available:

ollama run deepseek-r1:8b

Call DeepSeek locally with curl

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:8b",
  "messages": [
    { "role": "user", "content": "Explain how local LLM inference works." }
  ],
  "stream": false
}'

Ollama’s chat API endpoint is /api/chat, and the documentation shows the same message-based request structure.

Use DeepSeek locally from Python

Install the Ollama Python package:

pip install ollama

Then run:

import ollama

response = ollama.chat(
    model="deepseek-r1:8b",
    messages=[
        {"role": "user", "content": "Write a short Python function to reverse a string."}
    ],
)

print(response["message"]["content"])

This gives you a local DeepSeek API workflow without using DeepSeek’s hosted API.

Can You Run DeepSeek V4 Locally?

DeepSeek V4 matters for freshness, but it is not the default recommendation for most local users.

DeepSeek announced DeepSeek-V4 Preview on April 24, 2026. The official release says DeepSeek-V4-Pro has 1.6T total parameters with 49B active parameters, while DeepSeek-V4-Flash has 284B total parameters with 13B active parameters. It also says both official V4 services support 1M context and Thinking / Non-Thinking modes.

That is exciting, but it does not mean V4 is easy to run on a normal laptop.

For most people who want to install DeepSeek locally, these remain the practical choices:

deepseek-r1:1.5b for weak hardware.
deepseek-r1:8b for most users.
deepseek-r1:14b or deepseek-r1:32b for stronger machines.
GGUF/llama.cpp workflows for advanced users.
Full R1/R1-0528 quantized workflows only for large-memory setups.

Advanced users can track Hugging Face, GGUF conversions, vLLM, SGLang, llama.cpp, Ollama, and LM Studio support for DeepSeek V4. Beginners should not assume that DeepSeek V4 Pro or Flash will behave like an 8B model on a consumer laptop.

Recommended Settings for Better Output

DeepSeek reasoning models can be sensitive to sampling settings.

For DeepSeek-R1-0528, the official Hugging Face model card says benchmark sampling used temperature 0.6 and top-p 0.95. It also says system prompts are supported in R1-0528 and users no longer need to force the model into thinking mode by starting the output with <think>\n.

Unsloth’s R1-0528 guide repeats the practical recommendation: temperature 0.6, top_p 0.95, and multiple tests for reliable evaluation.

Use these baseline settings:

Setting	Suggested value	Why
temperature	0.6	Good default for reasoning-style R1 models
top_p	0.95	Keeps output varied without becoming too loose
context length	Lower if memory is tight	Longer context uses more memory
model size	Smaller if slow	1.5B/8B are faster than 14B/32B
GPU acceleration	Enable if available	Improves speed significantly
system prompt	Supported in R1-0528	Older R1 guidance differed

For coding and math, ask for structured reasoning and a final answer. For example:

Solve the problem step by step. Then give the final answer in a short summary.

For privacy-sensitive work, use local inference and avoid sending sensitive files to hosted models.

Troubleshooting DeepSeek Local Setup

“ollama: command not found”

Ollama is not installed correctly, or your terminal cannot find it.

Fix:

ollama --version

If that fails, reinstall Ollama and reopen your terminal.

Model downloads are slow

DeepSeek models can be large. Try a smaller model first:

ollama run deepseek-r1:1.5b

On WSL2, Ollama’s FAQ mentions that Windows 10 networking settings can affect installation and model downloads.

Out of memory

Use a smaller model:

ollama run deepseek-r1:1.5b

or:

ollama run deepseek-r1:8b

Also reduce context length if your runtime exposes that option.

DeepSeek runs too slowly

Use a smaller model, enable GPU acceleration where available, close other memory-heavy apps, or switch from CPU inference to a GPU-enabled runtime.

For Docker users with NVIDIA GPUs, Ollama’s Docker documentation shows a GPU-enabled container command using --gpus=all.

Open WebUI cannot connect to Ollama

Check whether Open WebUI is running on the host or inside a container. Try one of these connection URLs:

http://localhost:11434

http://host.docker.internal:11434

http://host.containers.internal:11434

Open WebUI’s docs specifically mention host.containers.internal:11434 for the relevant container connection configuration.

GPU is not being used

Update GPU drivers, install CUDA or the correct backend, and check whether your runtime supports your GPU. On Docker, verify NVIDIA Container Toolkit or ROCm support depending on your GPU.

Windows firewall or network issue

If an app cannot reach Ollama, allow the app through Windows Firewall or keep all requests on localhost.

Model gives poor answers

Try:

A larger model.
Temperature around 0.6.
Clearer prompts.
R1-0528 if available.
Multiple attempts for complex reasoning.

Confusion between distilled model and full DeepSeek-R1

deepseek-r1:8b is not the full 671B DeepSeek-R1 model. It is a smaller distilled model designed to be practical on consumer hardware. DeepSeek’s GitHub explains that the smaller 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints are distilled from reasoning data generated by DeepSeek-R1 and based on Qwen/Llama models.

Need to update Ollama, LM Studio, or llama.cpp

Local AI tools change quickly. Update your runtime if commands fail, models do not load, or templates behave incorrectly.

Ollama vs LM Studio vs Open WebUI vs llama.cpp

Tool	Best for	Difficulty	GUI?	API support?	Recommended user
Ollama	Fastest local DeepSeek setup	Easy	Minimal terminal UI	Yes, local API	Most beginners and developers
LM Studio	Desktop model browsing and chat	Easy	Yes	Yes	GUI-first users
Open WebUI	Browser-based ChatGPT-like interface	Medium	Yes, web UI	Connects to Ollama/OpenAI-compatible APIs	Users who want a polished local chat UI
llama.cpp	GGUF, quantization, manual tuning	Advanced	Basic server UI	Yes through llama-server	Power users and researchers

The simplest stack is Ollama only. The best local chat experience is Ollama + Open WebUI. The best desktop GUI is LM Studio. The best advanced GGUF workflow is llama.cpp.

Best Setup for Most Users

Use this decision guide:

User type	Recommended setup
Beginner	Ollama + `deepseek-r1:8b`
Low-end laptop	Ollama + `deepseek-r1:1.5b`
GUI user	LM Studio + a DeepSeek GGUF model
Browser UI user	Ollama + Open WebUI
Developer	Ollama + local API on `localhost:11434`
Advanced GGUF user	llama.cpp + quantized DeepSeek GGUF
Large-model experimenter	Full R1/R1-0528 quantized GGUF with serious RAM/VRAM
Production/high throughput	vLLM or SGLang only if compatible with the chosen model and hardware

For most readers, the best answer to “how to run DeepSeek locally” is:

ollama run deepseek-r1:8b

Then add Open WebUI if you want a browser interface, or LM Studio if you want a desktop GUI.

FAQ

Can I run DeepSeek locally for free?

Yes. You can run open DeepSeek models locally for free after downloading them. You still need your own hardware, storage, electricity, and time.

Do I need a DeepSeek API key to run it locally?

No. You do not need a DeepSeek API key to run local Ollama or LM Studio models. Ollama’s local API on localhost:11434 does not require authentication.

What is the easiest way to run DeepSeek locally?

Install Ollama and run:
ollama run deepseek-r1:8b

Can I run DeepSeek locally on Windows?

Yes. Install Ollama on Windows, open PowerShell or CMD, and run a DeepSeek command. Ollama’s Windows download page lists a PowerShell installer and says Windows 10 or later is required.

Can I run DeepSeek locally on a Mac?

Yes. Apple Silicon Macs are good for local LLMs because they use unified memory. Start with deepseek-r1:8b or deepseek-r1:14b, depending on your memory.

Can I run DeepSeek locally without a GPU?

Yes. CPU inference works, especially for smaller models, but it is slower. Start with deepseek-r1:1.5b or deepseek-r1:8b.

Which DeepSeek model should I choose?

Most users should choose deepseek-r1:8b. Use 1.5b for weak hardware, 14b for stronger machines, and 32b or larger only if you understand memory requirements.

Is DeepSeek-R1 8B the same as the full DeepSeek-R1?

No. The 8B model is a smaller distilled model. The full DeepSeek-R1 is a 671B-parameter model with 37B activated parameters.

Can I run DeepSeek V4 locally?

Possibly for advanced users, but it is not the default beginner path. DeepSeek-V4-Pro and V4-Flash are much larger than common R1 distilled models, so most users should start with DeepSeek-R1 8B or R1-0528-Qwen3-8B.

Is local DeepSeek private?

Local inference is more private than hosted inference because prompts can stay on your device. Ollama says it does not see prompts or data when models are run locally.

How much RAM do I need?

Use 8GB RAM for 1.5B, 16GB for 7B/8B, 24–32GB for 8B/14B, and 32–64GB or more for 32B-class models. Very large quantized R1/R1-0528 setups need much more.

Can I use DeepSeek locally through Python?

Yes. Install the Ollama Python package and call ollama.chat() with model="deepseek-r1:8b".

How do I make DeepSeek run faster?

Use a smaller model, enable GPU acceleration, reduce context length, close other apps, and keep the model loaded when making repeated API calls.

Can I run DeepSeek offline?

Yes, after the model is downloaded. You need internet for the first download, updates, and new model pulls.

Conclusion

The best way to run DeepSeek locally in 2026 is to start with Ollama:

ollama run deepseek-r1:8b

That gives you a practical local DeepSeek setup with no DeepSeek API key, a terminal chat interface, and a local API endpoint. Use deepseek-r1:1.5b for weak laptops, deepseek-r1:14b or 32b for stronger machines, LM Studio for a desktop GUI, Open WebUI for a browser-based chat interface, and llama.cpp only when you need advanced GGUF control.

For almost everyone, the winning setup is:

Ollama + deepseek-r1:8b

Add Open WebUI when you want a ChatGPT-like local interface.

Table of Contents