I Moved My Entire AI Stack Off the Cloud Today. Here’s Why — and How.

There is a quiet shift happening in how serious AI builders operate. Not the hype cycles. Not the press releases. The actual builders — the ones running automations, building agents, shipping personal tools — are moving their stacks off the cloud and onto their own machines.

Today I did it too. Here is what changed, why it matters, and exactly how I did it.

The problem with cloud-only AI

I Moved My Entire AI Stack Off the Cloud Today. Here’s Why — and How.

I have been running BruBot — my personal AI agent — on a Hetzner VPS for months. 339 Python scripts, 20 cron jobs, Telegram interface, morning briefs, portfolio tracking, outreach automation. Powerful. Never sleeps.

But it has a fundamental problem: every single thing it does goes through someone else server.

$0.04 $0.00  |  ~1.5s
per morning brief (before → after)  ·  local response time

Step 1: Install Ollama (2 minutes)

Ollama is the simplest way to run open-source LLMs locally. It runs a local API server on localhost:11434 with the same REST interface as OpenAI — meaning any tool built for OpenAI works with Ollama, zero code changes.

# macOS install
brew install ollama

# Start the server
ollama serve

Step 2: Pull Gemma 3

Google Gemma 3 is the sleeper pick right now. Fast, sharp, genuinely good at reasoning — and it runs on a MacBook Pro without breaking a sweat.

ollama pull gemma3:4b
ollama pull gemma3:12b
ollama run gemma3:4b "What should I focus on if I lead an SDR team?"

Response time: under 2 seconds. No API key. No billing. No rate limits.

Step 3: Wire it into your existing stack

# Before
client = openai.OpenAI(api_key=OPENAI_KEY)

# After
client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

What I actually use Gemma for

Task Model Why
Morning brief coaching Gemma 3:4b Fast, free, local
Complex reasoning Claude Sonnet Worth the API cost

The full local stack

~/.openclaw/workspace/   ← 339 scripts
localhost:11434          ← Ollama + Gemma 3
localhost:3030           ← ScreenPipe OCR

Why this matters

Speed: ~1.5s response vs 4–8s for GPT-4o

🔒 Privacy: Health, portfolio, calendar — none routes through OpenAI

🟢 Reliability: No rate limits. No outages.

💰 Cost: /bin/zsh per morning brief vs /bin/zsh.04

The surprise

Gemma 3 is genuinely good. Not good for a local model. Just good. The coaching insights are sharp. The SDR strategy prompts are solid. I expected to notice the quality drop. I did not.

10-minute quickstart

  1. brew install ollama
  2. ollama serve
  3. ollama pull gemma3:4b
  4. Swap base_url to http://localhost:11434/v1

Ollama is installed. Gemma is running. My morning brief costs exactly $0 in AI API calls. That feels good.

Similar Posts