The complete guide to running AI locally — hardware requirements, model selection, Ollama setup, and web UI configuration. Your data stays on your machine.
Running AI locally used to mean compiling CUDA libraries at 2am and praying your drivers matched. In 2026, Ollama has made it genuinely straightforward. This guide shows you exactly how to run AI locally — from picking hardware to having a full chat interface running in an afternoon.
The three big reasons to run AI locally haven't changed, but they've gotten more compelling:
The most important decision. GPU VRAM determines which models you can run efficiently. You have three paths:
Ollama is the standard tool for running AI locally. Installation takes 60 seconds:
curl -fsSL https://ollama.ai/install.sh | shdocker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollamaPull your first model. Start with Llama 3.1 8B — it's fast, capable, and fits in 6GB VRAM:
ollama pull llama3.1:8b
Other great starting models: ollama pull mistral · ollama pull gemma3:9b · ollama pull qwen2.5-coder:7b
The CLI is fine for testing, but you'll want a proper chat UI. Install Open WebUI:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000 in your browser. Create an account (local only) and start chatting.
To access your local AI from your phone or from outside your home network, use Tailscale (free for personal use) or OpenClaw which provides Telegram/WhatsApp access to your local AI without exposing any ports.
| Hardware | VRAM/RAM | Max Model | Speed (tok/s) | Power | Cost |
|---|---|---|---|---|---|
| ClawBox (Jetson Orin Nano) | 8GB unified | 13B Q4 | ~15 tok/s | 15W | €549 |
| RTX 3060 12GB | 12GB VRAM | 13B Q4 | ~30-50 tok/s | 120W | €350 |
| RTX 4090 24GB | 24GB VRAM | 34B Q4 | ~100 tok/s | 450W | €1,800 |
| Mac Mini M4 (16GB) | 16GB unified | 13B Q4 | ~40 tok/s | 12W | €800 |
| Mac Mini M4 Pro (48GB) | 48GB unified | 70B Q4 | ~25 tok/s | 20W | €2,000 |
ClawBox ships pre-configured with Ollama and a full AI stack. Plug in, scan QR, done — no terminal required.
See ClawBox →Minimum: 8GB RAM, modern CPU, 10-20GB disk space. For GPU acceleration (much faster), an NVIDIA GPU with 6GB+ VRAM or Apple Silicon with unified memory is ideal. Small 7B models can run on CPU-only machines, just more slowly.
Yes. Ollama works on Mac, Windows, and Linux laptops. MacBook Pro with M2/M3/M4 chips are excellent — unified memory runs 7B-30B models smoothly. On Windows/Linux, a discrete GPU makes a big difference but isn't required.
A pre-configured AI appliance like ClawBox — ships with Ollama, a web UI, and OpenClaw pre-installed. Plug in, scan QR, chatting in under 5 minutes. No terminal, no drivers, no configuration required.