How to Run DeepSeek Locally: Complete 2026 Guide

Quick answer: The easiest way to learn how to run DeepSeek locally is to install Ollama, open a terminal, and run:

ollama run deepseek-r1:8b

For a low-end laptop, use deepseek-r1:1.5b. For a stronger machine, try deepseek-r1:14b or deepseek-r1:32b. Ollama provides DeepSeek-R1 variants including 1.5B, 7B, 8B, 14B, 32B, 70B, and 671B, and the first run downloads the model before starting local chat.

Quick Answer: How to Run DeepSeek Locally

To run DeepSeek locally, install Ollama, then run one of these commands:

ollama run deepseek-r1:8b

For smaller machines:

ollama run deepseek-r1:1.5b

For better reasoning on machines with more RAM or VRAM:

ollama run deepseek-r1:14b

The first run downloads the model. After that, you can run DeepSeek offline as long as you do not need to download another model or update. You do not need a DeepSeek API key to run a local Ollama model, and Ollama’s local API does not require authentication when accessed through localhost:11434.

For most readers, the best starting point is:

ollama run deepseek-r1:8b

The 8b option is a practical balance between speed, memory use, and answer quality. Ollama’s DeepSeek-R1 page currently maps the 8B command to DeepSeek-R1-0528-Qwen3-8B, while also listing smaller and larger distilled variants.

What Does “Running DeepSeek Locally” Mean?

Running DeepSeek locally means the model runs on your own computer instead of sending every prompt to DeepSeek’s hosted chat or API service.

There are two very different ways to use DeepSeek:

OptionWhere inference happensNeeds internet during use?Needs DeepSeek API key?Best for
DeepSeek hosted chat/APIDeepSeek serversYesUsually yes for APIConvenience, strongest hosted models
Local DeepSeek modelYour computerNo, after downloadNo for local Ollama/LM Studio modelsPrivacy, offline use, experimentation

Local inference gives you more control. Your prompts can stay on your device, you avoid per-token API costs, and you can build a local DeepSeek API for your own tools. Ollama states that locally run models do not send prompts and answers back to Ollama’s servers, while cloud-hosted models are handled differently.

The tradeoff is hardware. A small DeepSeek R1 distilled model can run on a normal laptop. The full 671B DeepSeek-R1 or R1-0528 model is not a normal laptop setup. It requires advanced quantization, large storage, and serious RAM/VRAM planning.

Which DeepSeek Model Should You Run?

DeepSeek-R1 is the most practical DeepSeek family for local use today. The original DeepSeek-R1 and DeepSeek-R1-Zero are 671B-parameter mixture-of-experts models with 37B activated parameters and a 128K context length. DeepSeek also released six smaller distilled models based on Qwen and Llama, in 1.5B, 7B, 8B, 14B, 32B, and 70B sizes.

DeepSeek-R1-0528 is the important newer R1 update. The official Hugging Face model card describes it as a minor version upgrade with improved reasoning depth and inference capabilities, and it also notes that system prompts are now supported and users no longer need to force <think>\n at the beginning of the response.

Model / variantBest forApproximate hardware classProsLimitations
DeepSeek-R1 1.5BVery low-end laptops, quick tests8GB RAM or betterFastest, smallest, easiest to runWeakest reasoning and coding quality
DeepSeek-R1 7B / 8BMost beginners16GB RAM or Apple Silicon with enough unified memoryBest default balance; 8B is available as R1-0528-Qwen3-8B in OllamaStill not equal to the full R1 model
DeepSeek-R1 14BBetter reasoning on consumer hardware24–32GB RAM or decent GPUBetter quality than 7B/8BSlower; may need quantization
DeepSeek-R1 32BCoding, math, stronger local reasoning32–64GB RAM or 12–24GB VRAM depending on quantNoticeably stronger responsesHeavy for normal laptops
DeepSeek-R1 70BAdvanced local experiments64GB+ RAM or high-VRAM workstationMuch stronger than small distilled modelsSlow and memory-hungry
Full DeepSeek-R1 / R1-0528 671B quantizedResearchers, workstation users, enthusiastsLarge RAM/VRAM, advanced GGUF/Ollama/llama.cpp workflowsClosest to full DeepSeek local experienceNot beginner-friendly; huge storage and memory needs
DeepSeek V4 Flash / ProFreshness, tracking newest open DeepSeek modelsAdvanced only; not a default consumer setupNewer architecture, 1M context in official servicesV4-Pro and V4-Flash are far larger than common R1 distilled models

For most people searching “how to run DeepSeek locally,” the right choice is DeepSeek-R1 8B. Use 1.5b only if your computer is weak. Use 14b or 32b if you have enough memory and want better output.

Hardware Requirements for Running DeepSeek Locally

Exact DeepSeek hardware requirements depend on model size, quantization, context length, runtime, GPU acceleration, and whether you use CPU, NVIDIA CUDA, Apple Metal, or another backend.

Use this practical guide:

HardwareRecommended DeepSeek modelWhat to expect
8GB RAMDeepSeek-R1 1.5BWorks for basic tests; CPU inference may be slow
16GB RAMDeepSeek-R1 7B or 8BPractical starting point for most users
24–32GB RAMDeepSeek-R1 8B or 14BBetter speed and stability
32–64GB RAM or 12–24GB VRAMDeepSeek-R1 14B or 32BStronger local reasoning if quantized properly
64GB+ RAM or high VRAMDeepSeek-R1 32B or 70BAdvanced local use; expect tuning
128GB+ RAM / workstationLarge GGUF quantized modelsUseful for experiments, not casual use
180GB+ unified memory or combined RAM+VRAMAdvanced R1-0528 quant workflowsNeeded for better performance with huge quantized R1-0528 setups

Unsloth’s DeepSeek-R1-0528 local guide says the full model needs major preparation: the full model is listed as roughly 715GB in size, while a 1.66-bit dynamic quant is listed at 162GB. It also recommends at least 64GB RAM for that quant, and around 180GB unified memory or combined RAM+VRAM for better performance.

For beginners, ignore the full 671B model at first. Start with 8B.

Method 1 — Run DeepSeek Locally with Ollama

Ollama is the easiest way to install DeepSeek locally because it handles model download, local execution, and the local API server.

Ollama is available for macOS, Windows, and Linux. The official download page lists the Linux/macOS terminal installer, while the Windows page provides a PowerShell installer and says Windows 10 or later is required.

Step 1: Install Ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

macOS

Use the official macOS download, or install from the terminal if supported on your setup:

curl -fsSL https://ollama.com/install.sh | sh

Windows

Open PowerShell and run:

irm https://ollama.com/install.ps1 | iex

Then confirm the installation:

ollama --version

Step 2: Run DeepSeek-R1 8B

ollama run deepseek-r1:8b

The first run downloads the model. After the download finishes, Ollama opens an interactive chat in your terminal.

Step 3: Choose the right model size

Use these commands based on your hardware:

ollama run deepseek-r1:1.5b
ollama run deepseek-r1:7b
ollama run deepseek-r1:8b
ollama run deepseek-r1:14b
ollama run deepseek-r1:32b
ollama run deepseek-r1:70b

The official Ollama DeepSeek-R1 page lists commands for the 8B R1-0528-Qwen3 model, the full 671B model, and distilled models including 1.5B, 7B, 14B, 32B, and 70B.

Step 4: List installed models

ollama list

Step 5: Stop a model

ollama stop deepseek-r1:8b

Ollama’s FAQ also notes that models are kept in memory for a period after use and can be unloaded with ollama stop.

Step 6: Update the model

If you installed an older DeepSeek-R1 build, update it:

ollama pull deepseek-r1

Ollama’s DeepSeek-R1 page specifically notes that older versions can be updated with ollama pull deepseek-r1.

Method 2 — Run DeepSeek Locally with LM Studio

LM Studio is the best option if you want to run DeepSeek locally with a graphical desktop app instead of the terminal.

It works well for users who want to:

  • Search and download local models from a desktop interface.
  • Chat with DeepSeek without memorizing commands.
  • Load GGUF or MLX-style local model files.
  • Start a local server for apps and scripts.

LM Studio’s official local server documentation says you can serve local LLMs from the Developer tab on localhost or on your network, and it supports REST APIs, client libraries, OpenAI-compatible endpoints, and Anthropic-compatible endpoints.

Basic LM Studio setup

  1. Install LM Studio for your operating system.
  2. Open the app.
  3. Search for a DeepSeek model, such as a DeepSeek-R1 distilled GGUF model.
  4. Choose a quantized model that fits your RAM or VRAM.
  5. Download the model.
  6. Open the Chat tab and load it.
  7. Start chatting locally.

Optional: enable the local server

In LM Studio:

  1. Open the Developer tab.
  2. Select your downloaded DeepSeek model.
  3. Toggle Start server.
  4. Use the provided local endpoint from your own apps.

LM Studio is ideal for non-terminal users. Ollama is still the simpler recommendation for this guide, but LM Studio is often the better user experience for people who prefer a desktop GUI.

Method 3 — Add a ChatGPT-like UI with Open WebUI

Ollama’s terminal interface is useful, but many people want a browser-based interface that feels closer to ChatGPT. That is where Open WebUI helps.

Open WebUI describes itself as a self-hosted AI platform designed to operate offline, with support for Ollama and OpenAI-compatible APIs.

Basic flow

  1. Install and run Ollama.
  2. Run a DeepSeek model:
ollama run deepseek-r1:8b
  1. Install Open WebUI.
  2. Connect Open WebUI to your Ollama server.
  3. Chat with DeepSeek from the browser.

Docker example

A common Docker setup looks like this:

docker run -d \
-p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main

Then open the Open WebUI interface in your browser and configure the Ollama connection.

Common connection issue

If Open WebUI runs in a container, localhost inside the container is not always the same as localhost on your host machine. Open WebUI’s quick start documentation says that, in the relevant container setup, the Ollama API connection can be set to:

http://host.containers.internal:11434

Depending on Docker, Podman, Windows, macOS, or Linux networking, you may need host.docker.internal, host.containers.internal, or a host IP address.

Method 4 — Run DeepSeek with llama.cpp and GGUF

Use llama.cpp if you want more manual control over GGUF models, quantization, CPU/GPU tuning, and advanced large-model experiments.

This method is best for:

  • Advanced users.
  • GGUF model files from Hugging Face.
  • Manual quantized model downloads.
  • Large DeepSeek-R1 or R1-0528 experiments.
  • Local OpenAI-compatible endpoints through llama-server.

The official llama.cpp README states that llama.cpp requires models to be stored in GGUF format, and that models in other formats can be converted using conversion scripts.

High-level llama.cpp workflow

  1. Install build tools.
  2. Clone llama.cpp.
  3. Build llama-cli or llama-server.
  4. Download a compatible DeepSeek GGUF model.
  5. Run the model locally.
  6. Optionally expose it through an OpenAI-compatible local endpoint.

Example build outline:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release

Example local run:

./build/bin/llama-cli \
-m /path/to/deepseek-model.gguf \
-p "Explain local LLM inference in simple terms."

Example server run:

./build/bin/llama-server \
-m /path/to/deepseek-model.gguf \
--host 127.0.0.1 \
--port 8080

The llama.cpp HTTP server supports GPU and CPU inference for F16 and quantized models, plus OpenAI-compatible chat completions, responses, and embeddings routes.

For most beginners, do not start here. Use Ollama first, then come back to llama.cpp when you need GGUF-level control.

How to Use DeepSeek Locally from an API

Ollama automatically exposes a local API after installation. Its official API documentation says the default base URL is:

http://localhost:11434/api

Start Ollama server

In many desktop installations, Ollama already runs in the background. If needed, start it manually:

ollama serve

Then make sure your model is available:

ollama run deepseek-r1:8b

Call DeepSeek locally with curl

curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1:8b",
"messages": [
{ "role": "user", "content": "Explain how local LLM inference works." }
],
"stream": false
}'

Ollama’s chat API endpoint is /api/chat, and the documentation shows the same message-based request structure.

Use DeepSeek locally from Python

Install the Ollama Python package:

pip install ollama

Then run:

import ollama

response = ollama.chat(
model="deepseek-r1:8b",
messages=[
{"role": "user", "content": "Write a short Python function to reverse a string."}
],
)

print(response["message"]["content"])

This gives you a local DeepSeek API workflow without using DeepSeek’s hosted API.

Can You Run DeepSeek V4 Locally?

DeepSeek V4 matters for freshness, but it is not the default recommendation for most local users.

DeepSeek announced DeepSeek-V4 Preview on April 24, 2026. The official release says DeepSeek-V4-Pro has 1.6T total parameters with 49B active parameters, while DeepSeek-V4-Flash has 284B total parameters with 13B active parameters. It also says both official V4 services support 1M context and Thinking / Non-Thinking modes.

That is exciting, but it does not mean V4 is easy to run on a normal laptop.

For most people who want to install DeepSeek locally, these remain the practical choices:

  • deepseek-r1:1.5b for weak hardware.
  • deepseek-r1:8b for most users.
  • deepseek-r1:14b or deepseek-r1:32b for stronger machines.
  • GGUF/llama.cpp workflows for advanced users.
  • Full R1/R1-0528 quantized workflows only for large-memory setups.

Advanced users can track Hugging Face, GGUF conversions, vLLM, SGLang, llama.cpp, Ollama, and LM Studio support for DeepSeek V4. Beginners should not assume that DeepSeek V4 Pro or Flash will behave like an 8B model on a consumer laptop.

DeepSeek reasoning models can be sensitive to sampling settings.

For DeepSeek-R1-0528, the official Hugging Face model card says benchmark sampling used temperature 0.6 and top-p 0.95. It also says system prompts are supported in R1-0528 and users no longer need to force the model into thinking mode by starting the output with <think>\n.

Unsloth’s R1-0528 guide repeats the practical recommendation: temperature 0.6, top_p 0.95, and multiple tests for reliable evaluation.

Use these baseline settings:

SettingSuggested valueWhy
temperature0.6Good default for reasoning-style R1 models
top_p0.95Keeps output varied without becoming too loose
context lengthLower if memory is tightLonger context uses more memory
model sizeSmaller if slow1.5B/8B are faster than 14B/32B
GPU accelerationEnable if availableImproves speed significantly
system promptSupported in R1-0528Older R1 guidance differed

For coding and math, ask for structured reasoning and a final answer. For example:

Solve the problem step by step. Then give the final answer in a short summary.

For privacy-sensitive work, use local inference and avoid sending sensitive files to hosted models.

Troubleshooting DeepSeek Local Setup

“ollama: command not found”

Ollama is not installed correctly, or your terminal cannot find it.

Fix:

ollama --version

If that fails, reinstall Ollama and reopen your terminal.

Model downloads are slow

DeepSeek models can be large. Try a smaller model first:

ollama run deepseek-r1:1.5b

On WSL2, Ollama’s FAQ mentions that Windows 10 networking settings can affect installation and model downloads.

Out of memory

Use a smaller model:

ollama run deepseek-r1:1.5b

or:

ollama run deepseek-r1:8b

Also reduce context length if your runtime exposes that option.

DeepSeek runs too slowly

Use a smaller model, enable GPU acceleration where available, close other memory-heavy apps, or switch from CPU inference to a GPU-enabled runtime.

For Docker users with NVIDIA GPUs, Ollama’s Docker documentation shows a GPU-enabled container command using --gpus=all.

Open WebUI cannot connect to Ollama

Check whether Open WebUI is running on the host or inside a container. Try one of these connection URLs:

http://localhost:11434
http://host.docker.internal:11434
http://host.containers.internal:11434

Open WebUI’s docs specifically mention host.containers.internal:11434 for the relevant container connection configuration.

GPU is not being used

Update GPU drivers, install CUDA or the correct backend, and check whether your runtime supports your GPU. On Docker, verify NVIDIA Container Toolkit or ROCm support depending on your GPU.

Windows firewall or network issue

If an app cannot reach Ollama, allow the app through Windows Firewall or keep all requests on localhost.

Model gives poor answers

Try:

  • A larger model.
  • Temperature around 0.6.
  • Clearer prompts.
  • R1-0528 if available.
  • Multiple attempts for complex reasoning.

Confusion between distilled model and full DeepSeek-R1

deepseek-r1:8b is not the full 671B DeepSeek-R1 model. It is a smaller distilled model designed to be practical on consumer hardware. DeepSeek’s GitHub explains that the smaller 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints are distilled from reasoning data generated by DeepSeek-R1 and based on Qwen/Llama models.

Need to update Ollama, LM Studio, or llama.cpp

Local AI tools change quickly. Update your runtime if commands fail, models do not load, or templates behave incorrectly.

Ollama vs LM Studio vs Open WebUI vs llama.cpp

ToolBest forDifficultyGUI?API support?Recommended user
OllamaFastest local DeepSeek setupEasyMinimal terminal UIYes, local APIMost beginners and developers
LM StudioDesktop model browsing and chatEasyYesYesGUI-first users
Open WebUIBrowser-based ChatGPT-like interfaceMediumYes, web UIConnects to Ollama/OpenAI-compatible APIsUsers who want a polished local chat UI
llama.cppGGUF, quantization, manual tuningAdvancedBasic server UIYes through llama-serverPower users and researchers

The simplest stack is Ollama only. The best local chat experience is Ollama + Open WebUI. The best desktop GUI is LM Studio. The best advanced GGUF workflow is llama.cpp.

Best Setup for Most Users

Use this decision guide:

User typeRecommended setup
BeginnerOllama + deepseek-r1:8b
Low-end laptopOllama + deepseek-r1:1.5b
GUI userLM Studio + a DeepSeek GGUF model
Browser UI userOllama + Open WebUI
DeveloperOllama + local API on localhost:11434
Advanced GGUF userllama.cpp + quantized DeepSeek GGUF
Large-model experimenterFull R1/R1-0528 quantized GGUF with serious RAM/VRAM
Production/high throughputvLLM or SGLang only if compatible with the chosen model and hardware

For most readers, the best answer to “how to run DeepSeek locally” is:

ollama run deepseek-r1:8b

Then add Open WebUI if you want a browser interface, or LM Studio if you want a desktop GUI.

FAQ

Can I run DeepSeek locally for free?

Yes. You can run open DeepSeek models locally for free after downloading them. You still need your own hardware, storage, electricity, and time.

Do I need a DeepSeek API key to run it locally?

No. You do not need a DeepSeek API key to run local Ollama or LM Studio models. Ollama’s local API on localhost:11434 does not require authentication.

What is the easiest way to run DeepSeek locally?

Install Ollama and run:
ollama run deepseek-r1:8b

Can I run DeepSeek locally on Windows?

Yes. Install Ollama on Windows, open PowerShell or CMD, and run a DeepSeek command. Ollama’s Windows download page lists a PowerShell installer and says Windows 10 or later is required.

Can I run DeepSeek locally on a Mac?

Yes. Apple Silicon Macs are good for local LLMs because they use unified memory. Start with deepseek-r1:8b or deepseek-r1:14b, depending on your memory.

Can I run DeepSeek locally without a GPU?

Yes. CPU inference works, especially for smaller models, but it is slower. Start with deepseek-r1:1.5b or deepseek-r1:8b.

Which DeepSeek model should I choose?

Most users should choose deepseek-r1:8b. Use 1.5b for weak hardware, 14b for stronger machines, and 32b or larger only if you understand memory requirements.

Is DeepSeek-R1 8B the same as the full DeepSeek-R1?

No. The 8B model is a smaller distilled model. The full DeepSeek-R1 is a 671B-parameter model with 37B activated parameters.

Can I run DeepSeek V4 locally?

Possibly for advanced users, but it is not the default beginner path. DeepSeek-V4-Pro and V4-Flash are much larger than common R1 distilled models, so most users should start with DeepSeek-R1 8B or R1-0528-Qwen3-8B.

Is local DeepSeek private?

Local inference is more private than hosted inference because prompts can stay on your device. Ollama says it does not see prompts or data when models are run locally.

How much RAM do I need?

Use 8GB RAM for 1.5B, 16GB for 7B/8B, 24–32GB for 8B/14B, and 32–64GB or more for 32B-class models. Very large quantized R1/R1-0528 setups need much more.

Can I use DeepSeek locally through Python?

Yes. Install the Ollama Python package and call ollama.chat() with model="deepseek-r1:8b".

How do I make DeepSeek run faster?

Use a smaller model, enable GPU acceleration, reduce context length, close other apps, and keep the model loaded when making repeated API calls.

Can I run DeepSeek offline?

Yes, after the model is downloaded. You need internet for the first download, updates, and new model pulls.

Conclusion

The best way to run DeepSeek locally in 2026 is to start with Ollama:

ollama run deepseek-r1:8b

That gives you a practical local DeepSeek setup with no DeepSeek API key, a terminal chat interface, and a local API endpoint. Use deepseek-r1:1.5b for weak laptops, deepseek-r1:14b or 32b for stronger machines, LM Studio for a desktop GUI, Open WebUI for a browser-based chat interface, and llama.cpp only when you need advanced GGUF control.

For almost everyone, the winning setup is:

Ollama + deepseek-r1:8b

Add Open WebUI when you want a ChatGPT-like local interface.