AI that lives on your computer. Open-source, private & always local.
Run frontier AI entirely on your machine. No API keys, no telemetry, no limits. Own your models and conversation data.
curl -LsSf https://llama.app/install.sh | sh Prefer Brew or Winget? Package managers Rather build from source? Follow instructions
Pair it with a local coding agent.
Run llama serve, install the pi-llama plugin and launch Pi. It will automatically discover your local model. No config, no API keys. Files stay on your machine, requests never leave it.
# 1. Serve a model
llama serve
# 2. Install the pi-llama plugin
pi install git:github.com/huggingface/pi-llama
# 3. Run Pi, everything is set
piOptimized for any hardware.
From your laptop to a cluster, llama.cpp runs on whatever you have. Same binary, same models, same hand-tuned kernels for every GPU and CPU.
M Ultra
RTX 5090
CPU
Jetson
H100
MI300
RTX 4090
A100
M ProRun your first model
Qwen3.6-27B
27B paramsCoding & reasoning. Single-GPU sweet spot.
Run model
Qwen3.6-35B-A3B
35B MoE · 3B activeMoE: 35B-class quality at 3B-class speed.
Run model
Gemma-4-26B-A4B
26B MoE · 4B activeGoogle's desktop MoE. Strong reasoning, fast inference.
Run model
Gemma-4-E4B
4B effectiveTiny footprint. Runs on phones and low-end laptops.
Run model
gpt-oss-20b
20B MoE · 3.6B activeOpenAI's open weights. Frontier reasoning, local.
Run model
Step-3.7-Flash
198B MoE · 11B activeSnappy generalist for everyday chat and writing.
Run model