How to Install DeepSeek Locally: Complete Setup Guide for Windows, Mac & Linux

You can run DeepSeek locally on Windows, Mac, or Linux with Ollama. For most people, the easiest starting point is to install Ollama, pull deepseek-r1:8b, and run it from the terminal. If you want a browser-based interface, add Open WebUI on top. If you prefer an all-in-one desktop app instead of the command line, see our DeepSeek in LM Studio guide.

This guide is part of the broader Chat-Deep.ai DeepSeek guide hub, where we separate official DeepSeek products from independent local-use walkthroughs. Here, we focus specifically on local deployment with Ollama.

Last verified: April 13, 2026.

What this guide covers

This walkthrough is for people who want to run DeepSeek on their own machine for privacy, offline use after download, local development, or self-hosted experimentation. It covers:

  • How to choose a realistic DeepSeek model size for your hardware
  • How to install Ollama on Windows, macOS, and Linux
  • How to download and run DeepSeek locally
  • How to add a browser UI with Open WebUI
  • How to call your local DeepSeek model through Ollama’s API
  • How to troubleshoot the most common local setup problems

Before you start: understand the naming

A lot of DeepSeek content online mixes together official API model names and local open-weight checkpoints. They are not the same thing.

In the official DeepSeek API, deepseek-chat and deepseek-reasoner are platform aliases. On the local side, most single-machine Ollama users will actually run open-weight DeepSeek-R1 family checkpoints such as deepseek-r1:1.5b, 7b, 8b, 14b, 32b, or 70b.

If you want a broader comparison of current DeepSeek naming, modes, and model families, start with the DeepSeek Models Hub.

When to use Ollama, Open WebUI, or LM Studio

ToolBest forWhy you would choose it
OllamaFast local setup, terminal use, scripts, and a simple local APIIt is the fastest path to getting a model running with copy-and-paste commands.
Open WebUIBrowser chat on top of OllamaIt gives you a ChatGPT-style interface while keeping the model local.
LM StudioUsers who want a desktop GUI and easier GGUF workflowsIt is a good alternative if you would rather browse, download, and run models from an app.

For this guide, we use Ollama as the base runtime because it is simple, scriptable, and easy to pair with Open WebUI. If you want the desktop-first path instead, use our LM Studio setup guide.

Prerequisites

  • A supported computer:
    • Windows 10 or later
    • macOS 14 Sonoma or later
    • A modern Linux system for the standard installer
  • Enough free disk space for the model you plan to download
  • Enough RAM or GPU VRAM for the model you plan to run
  • An internet connection for the initial downloads

If you are unsure what your hardware can realistically handle, use the DeepSeek Hardware Chooser before you download a large model.

Choose the right DeepSeek model size

These are practical starting points for local inference, not guarantees. Real fit depends on quantization, context length, runtime overhead, concurrency, and whether you run on a discrete GPU, unified-memory Mac, or CPU-only system.

Ollama tagOfficial download sizePractical starting hardwareWho it is for
deepseek-r1:1.5b1.1 GBCPU-only or 4 GB VRAM, 8 GB+ system RAMTesting, lightweight prompts, older laptops
deepseek-r1:7b4.7 GB6 to 8 GB VRAM, or 16 GB+ system RAMEntry-level local chat and coding
deepseek-r1:8b5.2 GB8 GB VRAM, or 16 GB+ system RAMBest default starting point for most users
deepseek-r1:14b9.0 GB12 to 16 GB VRAM, or 24 to 32 GB+ system RAMBetter quality if you have midrange hardware
deepseek-r1:32b20 GB24 GB VRAM, or 48 to 64 GB+ system RAMPower users who want stronger local quality
deepseek-r1:70b43 GB48 to 80 GB VRAM or a heavy CPU/offload setupWorkstations and serious local setups
deepseek-r1:671b404 GBServer-class multi-GPU or distributed setup onlyResearch and infrastructure teams, not typical desktops

Simple recommendation

  • Start with 8b if you have a modern laptop or desktop and want the easiest good-enough setup.
  • Start with 7b or 1.5b if your hardware is limited.
  • Use 14b or 32b only if you already know your machine has the memory headroom.
  • Treat 70b and especially 671b as specialist options, not normal first installs.

Important note about 671B

The existence of a deepseek-r1:671b tag in Ollama does not mean it is a realistic local target for a normal single PC. In practice, most people looking for “run DeepSeek locally” should stay in the 7B to 32B range, or consider quantized GGUF workflows instead. If that is your situation, our DeepSeek quantization guide is the next page to read.

Step 1: Install Ollama

Windows

You can install Ollama on Windows with the official installer or PowerShell.

irm https://ollama.com/install.ps1 | iex

After installation, open PowerShell and verify:

ollama --version

macOS

Download the official macOS installer from Ollama, then verify in Terminal:

ollama --version

On Apple Silicon, Ollama uses Metal automatically. Unified memory helps, but it is still smart to choose a model that leaves memory headroom for the OS and other apps.

Linux

Install Ollama with the standard script:

curl -fsSL https://ollama.com/install.sh | sh

Then verify that the service is installed and running:

ollama --version
sudo systemctl start ollama
sudo systemctl status ollama

If you use NVIDIA, AMD, or Apple Silicon

  • NVIDIA: Ollama can use supported NVIDIA GPU acceleration when drivers are installed correctly.
  • AMD: Ollama documents AMD support through ROCm and additional Vulkan-based support, depending on platform.
  • Apple Silicon: Metal support is built into Ollama on Apple Silicon Macs.

If you are on AMD ROCm or Apple Metal and want a deeper local GPU-specific walkthrough, read our ROCm and Mac Metal guide.

Step 2: Download a DeepSeek model

For most first-time users, this is the right starting command:

ollama pull deepseek-r1:8b

Other common starting points:

# Very small / test setup
ollama pull deepseek-r1:1.5b

# Good entry-level local setup
ollama pull deepseek-r1:7b

# Default recommendation for many users
ollama pull deepseek-r1:8b

# Better quality if you have more memory
ollama pull deepseek-r1:14b

# Strong single-workstation option
ollama pull deepseek-r1:32b

If you are not sure whether to choose 8B, 14B, or 32B, the safest answer is to start smaller, confirm that everything works, and only then move up.

Step 3: Run DeepSeek locally

Once the model finishes downloading, launch it:

ollama run deepseek-r1:8b

You can also send a one-shot prompt directly from the command line:

ollama run deepseek-r1:8b "Write a Python function that checks whether a number is prime."

What about visible reasoning or “thinking” output?

Depending on your Ollama version, model, and settings, you may see a visible reasoning trace or only the final answer. Do not hard-code your app around older <think>...</think> formatting assumptions. Newer Ollama releases separate thinking more cleanly and also let you hide it when needed.

Examples:

# Enable thinking for a prompt
ollama run deepseek-r1:8b --think "Solve 19 * 27 step by step"

# Disable thinking and return only the answer directly
ollama run deepseek-r1:8b --think=false "Solve 19 * 27 step by step"

# Hide visible thinking while still using a thinking-capable model
ollama run deepseek-r1:8b --hidethinking "Solve 19 * 27 step by step"

In an interactive session, you can also toggle thinking behavior with session commands such as /set think and /set nothink.

Step 4: Add a browser UI with Open WebUI

Ollama alone is enough if you are comfortable with the terminal. If you want a clean browser interface, install Open WebUI on top of it.

Option A: Docker

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000.

Option B: pip

pip install open-webui
open-webui serve

Then open http://localhost:8080.

If you mainly want a graphical local workflow and model browser, LM Studio is still a valid alternative. Use whichever path fits the way you actually work.

Step 5: Use DeepSeek through the local Ollama API

Once Ollama is running, your local API is available at http://localhost:11434.

Simple chat request

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1:8b",
  "messages": [
    {
      "role": "user",
      "content": "Explain the difference between TCP and UDP in plain English."
    }
  ]
}'

OpenAI-compatible local endpoint

If you already use tools built around OpenAI-style chat calls, Ollama also supports OpenAI-compatible endpoints on /v1. That makes it a convenient bridge for local development and testing. If your next step is turning a local model into a more structured service, read our DeepSeek API serving guide.

Useful Ollama commands

CommandWhat it does
ollama pull deepseek-r1:8bDownloads a model
ollama run deepseek-r1:8bStarts a chat session with the model
ollama lsLists downloaded models
ollama psShows models currently loaded in memory
ollama stop deepseek-r1:8bStops a running model
ollama rm deepseek-r1:8bDeletes a downloaded model
ollama serveStarts the Ollama server

Common issues and fixes

1) “ollama: command not found”

Your installation may not be on your PATH yet, or your shell needs to be reopened. Close the terminal, open a new one, and run ollama --version again. On macOS, make sure the app finished linking the CLI. On Windows, rerun the installer or use the official PowerShell installer.

2) The model runs, but it is painfully slow

You are probably on CPU-only inference or running a model that is too large for your machine. Move down to 8b, 7b, or 1.5b. If your real goal is quality over privacy or offline use, the official API or browser workflows may be a better fit than forcing an oversized local model.

3) GPU acceleration is not kicking in

First, confirm your drivers and platform support. If you are on NVIDIA, verify your GPU is visible to the system. If you are on AMD, review the ROCm or Vulkan support path. If you are on Apple Silicon, Metal is automatic but memory pressure can still force you into a slower experience with oversized models.

4) You hit out-of-memory errors

Drop to a smaller tag, reduce the number of simultaneously running apps, and keep your prompts shorter while testing. For lower-memory systems, a quantized GGUF route may be more practical than a larger Ollama checkpoint. See our quantization guide or our llama.cpp CPU and Mac guide.

5) Open WebUI loads, but you cannot see your model

Make sure Ollama is actually running on the same machine and that your model has already been pulled locally. Confirm that the Ollama service is reachable before debugging the UI layer.

6) You want to access Ollama from another device on your network

By default, Ollama binds to localhost. If you plan to expose it beyond your machine, configure that intentionally and treat it as a security decision, not a convenience toggle.

Practical setup recommendations

Best first setup for most people

Install Ollama, pull deepseek-r1:8b, test in the terminal, then add Open WebUI only if you want a browser chat interface.

Best setup for older laptops or CPU-heavy systems

Start with 1.5b or 7b. If speed still feels poor, consider a smaller quantized GGUF workflow rather than forcing a larger model.

Best setup for a strong single-GPU workstation

Try 14b first, then 32b if you have the memory headroom. For a lot of individual users, 32B is the point where local quality becomes compelling without crossing into server-class complexity.

Best setup if you need an app-like desktop experience

Use LM Studio instead of stitching together terminal plus browser UI. It is a better fit for people who want to browse models visually and stay inside a desktop app.

FAQ

Can I run DeepSeek locally without a GPU?

Yes, but you should keep expectations realistic. Start with 1.5b or 7b. CPU-only inference is usually much slower than GPU-backed inference.

Which DeepSeek model should I start with?

For most people, deepseek-r1:8b is the best first install. It is a safer starting point than jumping directly to 14B, 32B, or larger.

Is the local Ollama model the same as the official DeepSeek API?

No. This guide is about local open-weight checkpoints. Official API aliases and local checkpoints are related to the same ecosystem, but they are not the same product path.

Can I use a GUI instead of the terminal?

Yes. Use Open WebUI on top of Ollama for a browser UI, or use LM Studio if you want a desktop-first workflow.

Is local DeepSeek private?

Local inference can keep prompts and outputs on your own machine, which is a major privacy advantage over hosted services. But local privacy still depends on your OS, browser, logs, network exposure, connected tools, and how you configured the system.

Should I try the 671B model at home?

Usually no. Its listed download size alone is enough to rule it out for most normal local setups. If you are not already running server-class hardware, that is not the right starting point.

Final recommendation

If your goal is to get DeepSeek running locally with the least friction, use Ollama plus deepseek-r1:8b, confirm performance on your own machine, and only then scale up. That path is faster, safer, and more realistic than chasing the biggest model name first.