I Moved My Entire AI Stack Off the Cloud Today. Here’s Why — and How.
There is a quiet shift happening in how serious AI builders operate. Not the hype cycles. Not the press releases. The actual builders — the ones running automations, building agents, shipping personal tools — are moving their stacks off the cloud and onto their own machines.
Today I did it too. Here is what changed, why it matters, and exactly how I did it.
The problem with cloud-only AI

I have been running BruBot — my personal AI agent — on a Hetzner VPS for months. 339 Python scripts, 20 cron jobs, Telegram interface, morning briefs, portfolio tracking, outreach automation. Powerful. Never sleeps.
But it has a fundamental problem: every single thing it does goes through someone else server.
per morning brief (before → after) · local response time
Step 1: Install Ollama (2 minutes)
Ollama is the simplest way to run open-source LLMs locally. It runs a local API server on localhost:11434 with the same REST interface as OpenAI — meaning any tool built for OpenAI works with Ollama, zero code changes.
# macOS install
brew install ollama
# Start the server
ollama serve
Step 2: Pull Gemma 3
Google Gemma 3 is the sleeper pick right now. Fast, sharp, genuinely good at reasoning — and it runs on a MacBook Pro without breaking a sweat.
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama run gemma3:4b "What should I focus on if I lead an SDR team?"
Response time: under 2 seconds. No API key. No billing. No rate limits.
Step 3: Wire it into your existing stack
# Before
client = openai.OpenAI(api_key=OPENAI_KEY)
# After
client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
What I actually use Gemma for
The full local stack
~/.openclaw/workspace/ ← 339 scripts
localhost:11434 ← Ollama + Gemma 3
localhost:3030 ← ScreenPipe OCR
Why this matters
⚡ Speed: ~1.5s response vs 4–8s for GPT-4o
🔒 Privacy: Health, portfolio, calendar — none routes through OpenAI
🟢 Reliability: No rate limits. No outages.
💰 Cost: /bin/zsh per morning brief vs /bin/zsh.04
The surprise
Gemma 3 is genuinely good. Not good for a local model. Just good. The coaching insights are sharp. The SDR strategy prompts are solid. I expected to notice the quality drop. I did not.
10-minute quickstart
brew install ollamaollama serveollama pull gemma3:4b- Swap
base_urltohttp://localhost:11434/v1
Ollama is installed. Gemma is running. My morning brief costs exactly $0 in AI API calls. That feels good.