ollama-lm-studio
Ollama not using your GPU: a quick checklist
Practical steps to find out why Ollama fell back to CPU and how to isolate driver, Docker, and permission issues.
Confirm the GPU is the issue
When Ollama feels slow, it is not always ignoring your GPU. The model may be too large, context may be too high, or VRAM may already be full. First confirm GPU usage with Task Manager, nvidia-smi, or the equivalent tool for your system.
If you use Docker, Ollama's official troubleshooting flow recommends checking whether the container runtime can see the GPU before blaming Ollama itself. With NVIDIA, a simple GPU-enabled container plus nvidia-smi is a useful baseline.
Drivers and restarts still matter
Outdated drivers, suspend/resume cycles, and stuck services can make Ollama fall back to CPU. On Linux, AMD setups can also depend on video or render group permissions.
On Windows, confirm you are using the correct native install, your NVIDIA or AMD driver is current, and no other process is holding most of your VRAM.
Shrink the test case
Test with a small model before diagnosing with a heavy one. If a lightweight model uses the GPU and a larger model falls back to CPU, the problem is probably memory pressure rather than hardware discovery.
Then increase model size and context gradually. This avoids mixing three different problems into one confusing test.
Diagnostic commands
On NVIDIA, watch the GPU while running a prompt:
nvidia-smi
ollama run llama3.2:3b "Write one short paragraph about local AI."
On Linux with AMD, check whether your user has access to render devices:
groups
ls -l /dev/dri
If you use Docker with NVIDIA, first verify that the container can see the GPU:
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
Official references
See Ollama's official GPU documentation and troubleshooting guide.