I Moved My Entire AI Stack Off the Cloud Today. Here’s Why — and How.

ByDevin April 4, 2026April 19, 2026

There is a quiet shift happening in how serious AI builders operate. Not the hype cycles. Not the press releases. The actual builders — the ones running automations, building agents, shipping personal tools — are moving their stacks off the cloud and onto their own machines.

Today I did it too. Here is what changed, why it matters, and exactly how I did it.

The problem with cloud-only AI

I Moved My Entire AI Stack Off the Cloud Today. Here’s Why — and How.

I have been running BruBot — my personal AI agent — on a Hetzner VPS for months. 339 Python scripts, 20 cron jobs, Telegram interface, morning briefs, portfolio tracking, outreach automation. Powerful. Never sleeps.

But it has a fundamental problem: every single thing it does goes through someone else server.

$0.04 → $0.00 | ~1.5s
per morning brief (before → after) · local response time

Step 1: Install Ollama (2 minutes)

Ollama is the simplest way to run open-source LLMs locally. It runs a local API server on localhost:11434 with the same REST interface as OpenAI — meaning any tool built for OpenAI works with Ollama, zero code changes.

# macOS install
brew install ollama

# Start the server
ollama serve

Step 2: Pull Gemma 3

Google Gemma 3 is the sleeper pick right now. Fast, sharp, genuinely good at reasoning — and it runs on a MacBook Pro without breaking a sweat.

ollama pull gemma3:4b
ollama pull gemma3:12b
ollama run gemma3:4b "What should I focus on if I lead an SDR team?"

Response time: under 2 seconds. No API key. No billing. No rate limits.

Step 3: Wire it into your existing stack

# Before
client = openai.OpenAI(api_key=OPENAI_KEY)

# After
client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

What I actually use Gemma for

Task	Model	Why
Morning brief coaching	Gemma 3:4b	Fast, free, local
Complex reasoning	Claude Sonnet	Worth the API cost

The full local stack

~/.openclaw/workspace/   ← 339 scripts
localhost:11434          ← Ollama + Gemma 3
localhost:3030           ← ScreenPipe OCR

Why this matters

⚡ Speed: ~1.5s response vs 4–8s for GPT-4o

🔒 Privacy: Health, portfolio, calendar — none routes through OpenAI

🟢 Reliability: No rate limits. No outages.

💰 Cost: /bin/zsh per morning brief vs /bin/zsh.04

The surprise

Gemma 3 is genuinely good. Not good for a local model. Just good. The coaching insights are sharp. The SDR strategy prompts are solid. I expected to notice the quality drop. I did not.

10-minute quickstart

brew install ollama
ollama serve
ollama pull gemma3:4b
Swap base_url to http://localhost:11434/v1

Ollama is installed. Gemma is running. My morning brief costs exactly $0 in AI API calls. That feels good.

Uncategorized

4.4 and Blind: The Day the Systems Failed but the Portfolio Didn’t
ByDevin May 22, 2026May 24, 2026

The Numbers: What a 4.4 Actually Feels Like The Life OS scored today at 4.4/10 with a “productive” tag attached, which is the kind of mixed signal that tells you the entire day was a contradiction. The work dimension came in at 5—barely passing. Health is a 3, which isn’t surprising since I have no…

Read More 4.4 and Blind: The Day the Systems Failed but the Portfolio Didn’t
Uncategorized

7.2 and the Automation Trap: When Systems Break and You Don’t Notice
ByDevin June 5, 2026June 6, 2026

The Score Says Focused. The Reality Says Blind. Friday closed at 7.2/10. Mood: focused. The number feels generous when I actually look at what happened. Here’s what the Life OS dashboard is telling me: health is a 9 (two workouts, 1000m swim, 28-minute HIIT, tefillin before sundown). Habits are at 9 (seven out of eight…

Read More 7.2 and the Automation Trap: When Systems Break and You Don’t Notice
Uncategorized

The Day I Spent 6 Hours Fighting My Own AI Infrastructure (And Won)
ByDevin March 30, 2026April 19, 2026

Some days you ship features. Some days you fight infrastructure. Today was the latter — and honestly? Those are the days that make the whole system stronger. Here’s what actually happened behind the scenes today. The Gateway Problem The Nerve/OpenClaw system has a concept called a Gateway Handshake — essentially the secure entry point that…

Read More The Day I Spent 6 Hours Fighting My Own AI Infrastructure (And Won)
Uncategorized

8.2 and Honest: The Day Habits Carried Everything Else
ByDevin June 10, 2026June 10, 2026

8.2 and Honest: The Day Habits Carried Everything Else Wednesday, June 10, 2026 — Tel Aviv The Life OS gave me an 8.2 today. Mood: productive. On paper, that sounds like a good day. And in some ways it genuinely was. But 8.2 can hide things if you don’t look closely at what’s underneath it….

Read More 8.2 and Honest: The Day Habits Carried Everything Else
Uncategorized

How I Actually Spend My Workday: Real Tools, Real Context Switching
ByDevin April 14, 2026June 6, 2026

Your calendar doesn’t tell the real story of how work gets done. Between Teams meetings, Outlook firefighting, LinkedIn rabbit holes, and that one WhatsApp video to your mom, there’s a rhythm to productive chaos that nobody talks about. The Meeting Layer: Where Decisions Happen Today started with a solid 24-minute Teams call reviewing projects with…

Read More How I Actually Spend My Workday: Real Tools, Real Context Switching
Uncategorized

When Communication Wins: A Day of Coordination Over Code
ByDevin April 10, 2026April 19, 2026

Some days you ship features. Other days you coordinate the people who will. Today’s screen data tells the story of a communication-heavy day—52 frames bouncing between email, Teams, and meetings—and what that actually reveals about productive work. The Email Hunt Eleven frames of Outlook search activity suggests you weren’t just checking messages—you were on a…

Read More When Communication Wins: A Day of Coordination Over Code