CanIRunAICanIRunAI
Back to blog

ollama-lm-studio

Ollama not using your GPU: a quick checklist

Practical steps to find out why Ollama fell back to CPU and how to isolate driver, Docker, and permission issues.

Kaua Miguel/2026-05-06/2 min read

Confirm the GPU is the issue

When Ollama feels slow, it is not always ignoring your GPU. The model may be too large, context may be too high, or VRAM may already be full. First confirm GPU usage with Task Manager, nvidia-smi, or the equivalent tool for your system.

If you use Docker, Ollama's official troubleshooting flow recommends checking whether the container runtime can see the GPU before blaming Ollama itself. With NVIDIA, a simple GPU-enabled container plus nvidia-smi is a useful baseline.

Drivers and restarts still matter

Outdated drivers, suspend/resume cycles, and stuck services can make Ollama fall back to CPU. On Linux, AMD setups can also depend on video or render group permissions.

On Windows, confirm you are using the correct native install, your NVIDIA or AMD driver is current, and no other process is holding most of your VRAM.

Shrink the test case

Test with a small model before diagnosing with a heavy one. If a lightweight model uses the GPU and a larger model falls back to CPU, the problem is probably memory pressure rather than hardware discovery.

Then increase model size and context gradually. This avoids mixing three different problems into one confusing test.

Diagnostic commands

On NVIDIA, watch the GPU while running a prompt:

nvidia-smi
ollama run llama3.2:3b "Write one short paragraph about local AI."

On Linux with AMD, check whether your user has access to render devices:

groups
ls -l /dev/dri

If you use Docker with NVIDIA, first verify that the container can see the GPU:

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Official references

See Ollama's official GPU documentation and troubleshooting guide.

Read next