CanIRunAICanIRunAI
Back to blog

can-i-run-model

DeepSeek R1 locally: which version should your hardware run?

How to choose distilled DeepSeek R1 variants, test them in Ollama, and avoid models that are too large for your VRAM.

Kaua Miguel/2026-05-05/2 min read

DeepSeek R1 is not one download

When people say they want to run DeepSeek R1 locally, they usually mean distilled or quantized variants, not the huge original model. That matters because the hardware experience changes completely with model size.

For a normal PC, start with smaller variants. They will not match the reasoning quality of a larger model, but they let you test answer style, step-by-step explanations, math, and coding tasks without freezing the machine.

First Ollama test

Start small:

ollama pull deepseek-r1:1.5b
ollama run deepseek-r1:1.5b "Solve 18 * 24 and explain in short steps."

If it feels comfortable, try a larger variant:

ollama pull deepseek-r1:7b
ollama run deepseek-r1:7b "Compare CPU-only and GPU local inference."

Watch RAM and VRAM while it runs. If the system swaps or time to first token is painful, go back to the smaller variant.

When it is worth using

DeepSeek R1 is most useful for reasoning, problem decomposition, and step-by-step explanations. For fast casual chat, smaller Qwen, Llama, or Gemma chat models may feel lighter and more direct.

My rule: use R1 when you want the model to think harder; use small chat models when you want a fast response.

Read next