How We Calculate

Understand the methodology behind CanIRunAI recommendations

Overview

CanIRunAI analyzes your hardware (GPU, CPU and RAM) and calculates compatibility with each local AI model. All scores are based on real measurements with RTX 3060 as baseline.

Tier System

Each model receives a tier based on estimated speed (tokens per second) on your hardware.

Excellent≥ 30 tok/s

Near-instant responses. Fluid experience for any use case.

speed

Good≥ 15 tok/s

Good speed. Comfortable for chat, coding and general use.

speed

Decent≥ 8 tok/s

Functional but noticeably slower. OK for non-interactive tasks.

speed

Slow≥ 3 tok/s

Usable with patience. Long responses may take minutes.

speed

Won't run< 3 tok/s

Doesn't fit in memory or too slow for practical use.

speed

VRAM Estimation

The model needs to fit in GPU memory (VRAM). We use Q4_K_M quantization as default — the best balance between quality and size.

Quantization

Reduces model weight precision to decrease size and speed up inference.

Q2_K

60%~2.5 GB

Q4_K_M

88%~3.9 GBbest balance

Q6_K

95%~5.3 GB

Q8_0

99%~6.7 GB

F16

100%~13 GB

How the model fits

●

Comfortable≤85%

Model uses up to 85% of VRAM. Full performance, no penalties.

penalty

◐

Tight fit≤100%

Model fits in VRAM but no headroom. 30% penalty from memory pressure.

30%

penalty

◑

Partial offloadGPU+RAM

Part on GPU, rest on RAM. 40-80% penalty depending on ratio.

40-80%

penalty

○

CPU onlyRAM only

No dedicated GPU. Uses 60% of RAM. 88-94% penalty vs GPU.

88-94%

penalty

Won't fit—

Insufficient memory. Model cannot be loaded.

100%

penalty

Bandwidth Scaling

Speed scales linearly with GPU bandwidth. The RTX 3060 (360 GB/s) is the baseline.

estimated tok/s = baseline × (your bandwidth ÷ 360)

GTX 1650

128 GB/s0.36×

RTX 3060

360 GB/s1×

RTX 4070

504 GB/s1.4×

RTX 4090

1008 GB/s2.8×

M4 Max

546 GB/s1.52×

RAM Factor

System RAM amount influences performance. 16GB is the baseline (factor 1.0).

Range: 0.65× (4GB) to 1.18× (32GB+)

0.65×

4GB

0.69×

8GB

1×

16GB

1.18×

32GB

1.18×

64GB

CPU Factor

CPU affects ~40% of performance in GPU mode (tokenization, KV cache, data transfer). In CPU-only mode, it's 100% of the impact.

Final Score (0-100)

The overall score is a weighted average of how many models run at each tier.

Score = (S×1.0 + A×0.8 + B×0.4 + C×0.1 + D×0.0) ÷ total × 100

Calculation Example

GPU: RTX 4070 (504 GB/s, 12GB)

RAM: 32GB

CPU: i7-13700K

Model: Llama 3.1 8B (7.5GB VRAM, 45 tok/s baseline)

1Bandwidth scale: 45 × (504÷360) = 63 tok/s

2RAM factor (32GB): 1.18

3Fit: 7.5GB ≤ 10.2GB (85%) → comfortable

4CPU i7-13: factor 1.0

63 × 1.18 × 1.0 = 74 tok/s → Tier S

Back

How We Calculate

Understand the methodology behind CanIRunAI recommendations

Overview

CanIRunAI analyzes your hardware (GPU, CPU and RAM) and calculates compatibility with each local AI model. All scores are based on real measurements with RTX 3060 as baseline.

Tier System

Each model receives a tier based on estimated speed (tokens per second) on your hardware.

Excellent≥ 30 tok/s

Near-instant responses. Fluid experience for any use case.

speed

Good≥ 15 tok/s

Good speed. Comfortable for chat, coding and general use.

speed

Decent≥ 8 tok/s

Functional but noticeably slower. OK for non-interactive tasks.

speed

Slow≥ 3 tok/s

Usable with patience. Long responses may take minutes.

speed

Won't run< 3 tok/s

Doesn't fit in memory or too slow for practical use.

speed

VRAM Estimation

The model needs to fit in GPU memory (VRAM). We use Q4_K_M quantization as default — the best balance between quality and size.

Quantization

Reduces model weight precision to decrease size and speed up inference.

Q2_K

60%~2.5 GB

Q4_K_M

88%~3.9 GBbest balance

Q6_K

95%~5.3 GB

Q8_0

99%~6.7 GB

F16

100%~13 GB

How the model fits

●

Comfortable≤85%

Model uses up to 85% of VRAM. Full performance, no penalties.

penalty

◐

Tight fit≤100%

Model fits in VRAM but no headroom. 30% penalty from memory pressure.

30%

penalty

◑

Partial offloadGPU+RAM

Part on GPU, rest on RAM. 40-80% penalty depending on ratio.

40-80%

penalty

○

CPU onlyRAM only

No dedicated GPU. Uses 60% of RAM. 88-94% penalty vs GPU.

88-94%

penalty

Won't fit—

Insufficient memory. Model cannot be loaded.

100%

penalty

Bandwidth Scaling

Speed scales linearly with GPU bandwidth. The RTX 3060 (360 GB/s) is the baseline.

estimated tok/s = baseline × (your bandwidth ÷ 360)

GTX 1650

128 GB/s0.36×

RTX 3060

360 GB/s1×

RTX 4070

504 GB/s1.4×

RTX 4090

1008 GB/s2.8×

M4 Max

546 GB/s1.52×

RAM Factor

System RAM amount influences performance. 16GB is the baseline (factor 1.0).

Range: 0.65× (4GB) to 1.18× (32GB+)

0.65×

4GB

0.69×

8GB

1×

16GB

1.18×

32GB

1.18×

64GB

CPU Factor

CPU affects ~40% of performance in GPU mode (tokenization, KV cache, data transfer). In CPU-only mode, it's 100% of the impact.

Final Score (0-100)

The overall score is a weighted average of how many models run at each tier.

Score = (S×1.0 + A×0.8 + B×0.4 + C×0.1 + D×0.0) ÷ total × 100

Calculation Example

GPU: RTX 4070 (504 GB/s, 12GB)

RAM: 32GB

CPU: i7-13700K

Model: Llama 3.1 8B (7.5GB VRAM, 45 tok/s baseline)

1Bandwidth scale: 45 × (504÷360) = 63 tok/s

2RAM factor (32GB): 1.18

3Fit: 7.5GB ≤ 10.2GB (85%) → comfortable

4CPU i7-13: factor 1.0

63 × 1.18 × 1.0 = 74 tok/s → Tier S