How to Install DeepSeek Locally: Complete Setup Guide for Windows, Mac & Linux

Last verified: April 27, 2026.

Quick answer :
You can run DeepSeek locally on Windows, macOS, or Linux with Ollama. For most users, the simplest setup is: install Ollama, pull deepseek-r1:8b, and run it from the terminal. If you want a browser-based chat interface, add Open WebUI on top. If you prefer a desktop app and GGUF model browsing, use our DeepSeek in LM Studio guide.

Important naming note: this article is about running local open-weight DeepSeek models through Ollama. It is not the same as calling DeepSeek’s hosted API. The current official DeepSeek API model IDs are deepseek-v4-flash and deepseek-v4-pro. The older API names deepseek-chat and deepseek-reasoner are legacy aliases and are scheduled to be fully retired and inaccessible after July 24, 2026, 15:59 UTC. For local Ollama usage, you normally use tags such as deepseek-r1:8b, not the hosted API model IDs

This guide is part of the broader Chat-Deep.ai DeepSeek AI guide hub. Chat-Deep.ai is an independent resource and is not the official DeepSeek website. The goal of this page is to help you install and run DeepSeek-style local models safely while keeping the model naming consistent with the rest of our DeepSeek documentation.

What this guide covers

This walkthrough is for users who want local DeepSeek inference for privacy, offline use after download, local development, testing, or self-hosted experimentation. It covers:

How to understand the difference between DeepSeek API models and local Ollama tags
How to choose a realistic DeepSeek-R1 model size for your hardware
How to install Ollama on Windows, macOS, and Linux
How to download and run DeepSeek locally with Ollama
How to add Open WebUI for a browser chat interface
How to call your local DeepSeek model through the Ollama API
How to troubleshoot common installation, performance, and privacy issues

Before you start: API names and local model names are different

A common mistake is mixing hosted DeepSeek API model IDs with local model tags. They are related to the broader DeepSeek ecosystem, but they are not interchangeable.

Where you are using DeepSeek	Use this kind of name	Example
Official hosted DeepSeek API	Official API model ID	`deepseek-v4-flash` or `deepseek-v4-pro`
Ollama local runtime	Ollama model tag	`deepseek-r1:8b`, `deepseek-r1:14b`, or another Ollama tag
LM Studio local runtime	LM Studio local model identifier	The exact model ID shown in LM Studio after download/import
Hugging Face model page	Repository/checkpoint name	`deepseek-ai/DeepSeek-R1-Distill-Llama-8B` or another official checkpoint

If you put deepseek-v4-pro into an Ollama command, Ollama will not treat it as a local DeepSeek-R1 tag unless that exact tag exists in your Ollama environment. If you put deepseek-r1:8b into the official DeepSeek API, the API will not treat it as a valid hosted V4 model ID. Keep these paths separate.

For a broader comparison of hosted, local, reasoning, coding, and legacy DeepSeek models, use the DeepSeek Models Hub, the DeepSeek V4 guide, and the DeepSeek local vs API guide.

When to use Ollama, Open WebUI, LM Studio, or the official API

Option	Best for	Why choose it
Ollama	Fast local setup, terminal use, scripts, and a simple local API	It is the most direct way to pull and run a local model with a few commands.
Open WebUI	Browser chat on top of Ollama	It gives you a ChatGPT-style local interface while connecting to Ollama or OpenAI-compatible providers.
LM Studio	Desktop GUI, model browsing, GGUF workflows, and local server mode	It is a good fit if you prefer an all-in-one app instead of terminal commands.
Official DeepSeek API	Hosted V4 models, high-quality API workflows, and large-context production usage	Use `deepseek-v4-flash` or `deepseek-v4-pro` when you want the hosted DeepSeek service rather than local inference.

This guide uses Ollama as the base runtime because it is simple, scriptable, and easy to pair with Open WebUI. If you want the desktop-first path instead, read DeepSeek in LM Studio.

Prerequisites

A supported computer:
- Windows 10 22H2 or newer
- macOS 14 Sonoma or newer
- A modern Linux system for the standard installer
Enough free disk space for the model you plan to download. DeepSeek local models can range from about 1 GB to hundreds of GB depending on the tag.
Enough RAM or GPU VRAM for the model you plan to run. Bigger tags require more memory and can run slowly on CPU-only systems.
An internet connection for the initial Ollama and model downloads.
Optional: Docker, if you want the easiest Open WebUI setup.

If you are unsure what your machine can handle, start smaller. The safest first test is usually deepseek-r1:8b on a modern machine, or deepseek-r1:7b / deepseek-r1:1.5b on limited hardware. You can scale up after confirming the install works.

Choose the right DeepSeek model size

Ollama lists several DeepSeek-R1 tags, including 1.5B, 7B, 8B, 14B, 32B, 70B, and 671B variants. DeepSeek’s official R1 repository also describes the full 671B / 37B-active DeepSeek-R1 model and the smaller distilled models based on Qwen and Llama families. For normal desktops and laptops, the distilled tags are the practical starting point.

The table below uses Ollama’s listed download sizes as practical orientation points. These are not hard guarantees: real performance depends on quantization, context length, runtime settings, CPU/GPU offload, drivers, temperature, output length, and how many other applications are running.

Ollama tag	Listed download size	Practical starting hardware	Best use
`deepseek-r1:1.5b`	About 1.1 GB	CPU-only testing, older laptops, or 8 GB+ system RAM	Install tests, lightweight prompts, very limited hardware
`deepseek-r1:7b`	About 4.7 GB	16 GB+ system RAM or entry-level GPU setups	Entry local chat and coding experiments
`deepseek-r1:8b`	About 5.2 GB	16 GB+ system RAM, Apple Silicon unified memory, or 8 GB VRAM class GPUs	Best default starting point for most users
`deepseek-r1:14b`	About 9.0 GB	24–32 GB+ system RAM or stronger GPU setups	Better quality if your machine has memory headroom
`deepseek-r1:32b`	About 20 GB	48–64 GB+ system RAM or high-VRAM workstation setups	Power users who want stronger local quality
`deepseek-r1:70b`	About 43 GB	High-memory workstation or multi-GPU/offload setup	Advanced local experimentation
`deepseek-r1:671b`	About 404 GB	Server-class or distributed infrastructure	Research/infrastructure teams, not typical local installs

Simple model recommendation

Use deepseek-r1:8b for the first install on a modern laptop or desktop.
Use deepseek-r1:7b or deepseek-r1:1.5b if the machine is older or memory-constrained.
Use deepseek-r1:14b or deepseek-r1:32b only after confirming that smaller tags run smoothly.
Treat deepseek-r1:70b and deepseek-r1:671b as specialist options, not beginner installs.

Important note about long context

Some DeepSeek and Ollama tags advertise very large context windows. That does not mean you should max out context length on a laptop. Large context consumes memory and slows inference. For normal local chat, coding, and testing, a smaller practical context is usually faster and more stable. Raise context only when you actually need to process long documents.

Step 1: Install Ollama

Windows

Install Ollama with the official Windows installer from Ollama. The Windows build is designed to install in your user account and does not normally require Administrator rights. After installing, open PowerShell or Windows Terminal and verify:

ollama --version

If you use an NVIDIA GPU, make sure your driver is current enough for Ollama’s Windows requirements. If you use an AMD Radeon GPU, install the current AMD driver package. If the command is not found, restart the terminal or relaunch Ollama from the Start menu.

macOS

Download the official macOS app from Ollama. Ollama’s macOS documentation requires macOS 14 Sonoma or newer. After installing the app, verify that the CLI is available:

ollama --version

On Apple Silicon Macs, Ollama can use Apple’s local acceleration path. Unified memory helps, but it is still important to choose a model size that leaves enough memory for macOS and other apps.

Linux

Install Ollama with the official Linux installer script:

curl -fsSL https://ollama.com/install.sh | sh

Then verify the installation and check the service:

ollama --version
sudo systemctl status ollama

If the service is installed but not running, start it:

sudo systemctl start ollama

Step 2: Download a DeepSeek model

For most first-time users, start with the 8B tag:

ollama pull deepseek-r1:8b

Other useful starting points:

# Very small test setup
ollama pull deepseek-r1:1.5b

# Entry-level local setup
ollama pull deepseek-r1:7b

# Recommended first install for many users
ollama pull deepseek-r1:8b

# Better quality if you have more memory
ollama pull deepseek-r1:14b

# Strong workstation option
ollama pull deepseek-r1:32b

Do not start with the largest tag just because it appears in the library. A smaller model that runs smoothly is usually more useful than a huge model that barely loads or responds too slowly to use.

Step 3: Run DeepSeek locally

Once the model finishes downloading, launch it:

ollama run deepseek-r1:8b

You can also send a one-shot prompt directly from the command line:

ollama run deepseek-r1:8b "Write a Python function that checks whether a number is prime."

Useful Ollama commands:

Command	What it does
`ollama pull deepseek-r1:8b`	Downloads the model tag
`ollama run deepseek-r1:8b`	Starts an interactive local chat
`ollama ls`	Lists downloaded models
`ollama ps`	Shows currently loaded models
`ollama stop deepseek-r1:8b`	Stops a running model
`ollama rm deepseek-r1:8b`	Deletes the downloaded model
`ollama serve`	Starts the Ollama server manually if needed

Step 4: Manage DeepSeek reasoning output

DeepSeek-R1 is a reasoning-capable model family. Depending on the Ollama version, tag, and settings, you may see a separated reasoning trace, a visible thinking block, or only the final answer. Do not build production code that depends only on manually stripping <think>...</think> text. Modern Ollama thinking support can separate reasoning from the final response through fields such as message.thinking and message.content.

In the CLI, thinking behavior can be toggled for supported models:

# Enable thinking for a prompt
ollama run deepseek-r1:8b --think "Solve 19 * 27 step by step"

# Disable thinking for a prompt, if supported by the model/runtime
ollama run deepseek-r1:8b --think=false "Solve 19 * 27 step by step"

# Keep the benefits of a thinking-capable model while hiding visible thinking in CLI output
ollama run deepseek-r1:8b --hidethinking "Solve 19 * 27 step by step"

Inside an interactive Ollama session, you can also use:

/set think
/set nothink

For apps, prefer API-level handling. Use the think field where supported, display the final answer to users, and store or hide the reasoning trace according to your product and privacy requirements.

Step 5: Add a browser UI with Open WebUI

Ollama alone is enough if you are comfortable with the terminal. If you want a browser chat interface, install Open WebUI. Open WebUI’s documentation describes Docker as the recommended path for most users, while Python installs are also available for manual setups.

Option A: Docker, recommended for most users

If Ollama is running on your host machine and Open WebUI runs in Docker, use the host gateway mapping so the container can reach the Ollama server:

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Then open:

http://localhost:3000

The volume open-webui:/app/backend/data preserves your Open WebUI data between container restarts. Do not remove that volume unless you intentionally want to delete local Open WebUI data.

Option B: Python install

If Docker is not your preference, Open WebUI also documents Python-based installation:

pip install open-webui
open-webui serve

Then open:

http://localhost:8080

If you mainly want a graphical local workflow and model browser rather than an Ollama-centered setup, LM Studio is a valid alternative. See our DeepSeek in LM Studio guide.

Step 6: Use DeepSeek through the local Ollama API

When Ollama is running, the local API is available at:

http://localhost:11434

Simple chat request

curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-r1:8b",
    "messages": [
      {
        "role": "user",
        "content": "Explain the difference between TCP and UDP in plain English."
      }
    ],
    "think": true,
    "stream": false
  }'

For supported thinking models, the response can separate thinking output from the final content. That is cleaner than relying only on visible <think> tags.

Python example with the Ollama SDK

from ollama import chat

response = chat(
    model="deepseek-r1:8b",
    messages=[
        {
            "role": "user",
            "content": "Explain the difference between TCP and UDP in plain English."
        }
    ],
    think=True,
    stream=False,
)

# Final answer
print(response.message.content)

# Optional reasoning trace, if returned by your model/runtime
if getattr(response.message, "thinking", None):
    print("\n--- thinking trace ---")
    print(response.message.thinking)

OpenAI-compatible local endpoint

Ollama also supports OpenAI-compatible endpoints under /v1. This can be useful when an existing tool already expects an OpenAI-style API shape.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

completion = client.chat.completions.create(
    model="deepseek-r1:8b",
    messages=[
        {
            "role": "user",
            "content": "Give me a concise local DeepSeek setup checklist."
        }
    ],
)

print(completion.choices[0].message.content)

For native DeepSeek hosted API calls, use our DeepSeek API guide. For local Ollama calls, use the Ollama model tag you pulled locally.

Privacy and security notes

Running DeepSeek locally can improve privacy because prompts and outputs are processed on your machine after the model has been downloaded. However, “local” does not automatically mean “risk-free.” Your privacy depends on your operating system, logs, chat UI, connected tools, network settings, and whether you expose local services to other devices.

Keep local servers local. Do not expose Ollama’s 11434 port or Open WebUI’s 3000/8080 port to the public internet.
Be careful with LAN access. If you intentionally allow access from other devices, secure your network and use authentication where available.
Protect chat history. Open WebUI and other local interfaces may store chats, settings, uploads, or logs on disk.
Download from trusted sources. Prefer official Ollama tags, the official deepseek-ai organization on Hugging Face, and known tooling sources.
Do not confuse local inference with official DeepSeek services. Local Ollama prompts are not sent to DeepSeek’s API unless you connect an external provider or tool that does so.

For broader site policy context, see our Security page and Privacy Policy.

Common issues and fixes

1) “ollama: command not found”

Close and reopen your terminal, then run ollama --version again. On macOS, make sure the Ollama app finished linking the CLI. On Windows, relaunch Ollama from the Start menu or reinstall with the official installer. On Linux, check that the installer completed successfully and that your shell PATH is updated.

2) The model runs, but it is painfully slow

You are probably running CPU-only inference, using a model that is too large, or using too much context. Drop to deepseek-r1:8b, deepseek-r1:7b, or deepseek-r1:1.5b. Keep prompts shorter while testing. If your goal is maximum answer quality rather than privacy/offline use, the hosted DeepSeek API or Chat-Deep.ai browser workflows may be a better fit.

3) You hit out-of-memory errors

Use a smaller tag, close memory-heavy apps, reduce context length, and avoid loading multiple local models at the same time. Large tags such as 70b and 671b are not realistic first installs for normal laptops.

4) GPU acceleration is not working

Check your drivers and platform support first. Windows users with NVIDIA GPUs should verify the NVIDIA driver. AMD users should install the correct Radeon/ROCm-compatible driver path for their platform. Apple Silicon users should still watch memory pressure even though Apple’s local acceleration path is available.

5) Open WebUI loads, but it cannot see Ollama

Confirm Ollama is running by opening a terminal and running:

ollama ls

If Open WebUI is inside Docker and Ollama is running on the host, the container may need to reach Ollama through host.docker.internal:11434. The Docker command above includes --add-host=host.docker.internal:host-gateway for that reason. In Open WebUI settings, check the Ollama connection URL if models are not appearing.

6) The API returns “model not found”

Run:

ollama ls

Then use the exact local tag shown in the list. If you pulled deepseek-r1:8b, use deepseek-r1:8b in API calls. Do not replace it with deepseek-v4-pro unless you are using a hosted API or a separate model source that explicitly supports that name.

7) The output shows thinking text and you only want the answer

For CLI usage, try --hidethinking or --think=false if supported by your model/runtime. For app development, handle separated fields such as message.thinking and message.content instead of relying only on regular expressions over visible <think> text.

8) You installed the biggest model and the machine freezes

This is usually a hardware mismatch, not a DeepSeek bug. Start with deepseek-r1:8b or smaller. Confirm basic operation, then move up only if your machine has enough memory and acceptable speed.

For more help, see our DeepSeek troubleshooting guide.

Practical setup recommendations

Best first setup for most people

Install Ollama, pull deepseek-r1:8b, test it in the terminal, then add Open WebUI if you want a browser interface.

Best setup for older laptops

Start with deepseek-r1:1.5b or deepseek-r1:7b. Accept that CPU-only inference may be slow, and avoid very large context windows.

Best setup for strong local hardware

Try deepseek-r1:14b first, then deepseek-r1:32b if you have enough memory. For many individual users, 32B is already a serious local setup.

Best setup if you need a desktop app

Use LM Studio instead of combining terminal commands with a separate browser UI.

Best setup if you need the current hosted DeepSeek model family

Use the official DeepSeek API with deepseek-v4-flash or deepseek-v4-pro. Local Ollama R1 tags are useful, but they are not the same product path as the hosted V4 API.

FAQ

Can I run DeepSeek locally without a GPU?

Yes, but keep expectations realistic. Start with deepseek-r1:1.5b or deepseek-r1:7b. CPU-only inference is usually much slower than GPU-backed inference, especially for longer answers and reasoning-heavy prompts.

Which DeepSeek model should I install first?

For most users, deepseek-r1:8b is the best first install. It is large enough to be useful but still much more realistic than 14B, 32B, 70B, or 671B on normal hardware.

Is local Ollama DeepSeek the same as the official DeepSeek API?

No. Ollama local usage normally runs local open-weight model tags such as deepseek-r1:8b. The current official hosted DeepSeek API uses deepseek-v4-flash and deepseek-v4-pro. These names and product paths are not interchangeable.

Do I need a DeepSeek API key for this local setup?

No. You do not need a DeepSeek API key to run a local Ollama model after it has been downloaded. You only need an API key if you are calling DeepSeek’s hosted API or another external provider.

Can I use a GUI instead of the terminal?

Yes. Use Open WebUI on top of Ollama for a browser UI, or use LM Studio if you want a desktop-first workflow with model browsing and a local server interface.

Is local DeepSeek private?

Local inference can keep prompts and outputs on your machine, which is a privacy advantage. However, privacy still depends on your chat UI, local logs, connected tools, browser settings, network exposure, and operating system security. Do not expose local services to the internet.

Should I try the 671B model at home?

Usually no. The listed download size alone makes it impractical for normal laptops and desktops. Use a smaller distilled tag unless you already operate server-class or distributed hardware.

Can I install DeepSeek V4 locally with Ollama?

This guide focuses on practical local Ollama usage with DeepSeek-R1-style tags. DeepSeek V4 is the current hosted API focus on Chat-Deep.ai, but V4-class open-weight models are much larger infrastructure targets and should not be treated as normal beginner local installs. If you need current V4 behavior, use the hosted API path with deepseek-v4-flash or deepseek-v4-pro.

What happened to deepseek-chat and deepseek-reasoner?

Those are legacy hosted API aliases, not local Ollama tags. According to DeepSeek’s current API documentation, they route to V4 Flash modes during the transition period and will be fully retired and inaccessible after July 24, 2026, 15:59 UTC.

Final recommendation

If your goal is to get DeepSeek running locally with the least friction, install Ollama, pull deepseek-r1:8b, and test it in the terminal. Add Open WebUI only after the model runs correctly. If your hardware struggles, move down to deepseek-r1:7b or deepseek-r1:1.5b. If you need the current official DeepSeek hosted API experience, use deepseek-v4-flash or deepseek-v4-pro instead of trying to force a local R1 tag into an API workflow.

Next useful pages: DeepSeek Models Hub, DeepSeek V4, DeepSeek R1, DeepSeek local vs API, DeepSeek in LM Studio, and DeepSeek API guide.