How I Cut AI Automation Costs While Building a Smarter Personal Assistant

Running a personal AI assistant 24/7 sounds expensive. And honestly? It was. But this week we made a series of architectural decisions that are driving those costs down while making the system meaningfully smarter. Here’s a full breakdown of what we built today and why it matters.


The Cost Problem

How I Cut AI Automation Costs While Building a Smarter Personal Assistant

Let’s start with the honest number: yesterday’s total API spend was $18.17. Most of it — $13.65 — went to Claude Sonnet, the heavy-reasoning model I use for complex tasks. GPT-4o added another $2.72.

For a personal assistant running automations, that’s too high. The fix isn’t to use worse models — it’s to use the right model for each task, and to build systems that don’t need to re-process the same information every time. Today we tackled that directly.

1. Facebook Marketplace Deal Scout — Israeli Proxy Matters

The first build: a daily deal-finder that scans Facebook Marketplace in Tel Aviv and surfaces the 5 best lifestyle deals every afternoon.

The interesting engineering decision here was which scraper to use. The obvious choice — Apify’s native Facebook Marketplace actor — actually returns American listings when run from a standard API call. It uses US-based proxies. Useless for someone shopping in Tel Aviv.

The solution was a Vercel-hosted scraper that routes through Israeli proxies. Real results. Real ₪ prices. Real Tel Aviv locations.

Confirmed cost via Apify API: $0.10 per actor run × 7 queries = $0.70/day (~$21/month). Runs automatically every day at 2 PM Israel time, delivered directly to Telegram.

The lesson: cheap-looking tools can be expensive when they return the wrong data. Always verify costs at the source — not from wrapper estimates that can be off by 10x.

2. RAG Over My Second Brain — Semantic Memory for ~Free

The biggest build of the day: a full RAG (Retrieval-Augmented Generation) system over my personal knowledge base.

Previously, if I asked BruBot “what did I decide about apartments?” it was either guessing or making a blind API call to one specific Notion page. No persistent, searchable index of months of notes, decisions, and frameworks.

What we built:

  • rag_sync.py — pulls all local memory files + Notion Goals DB, chunks them, embeds into ChromaDB using OpenAI’s text-embedding-3-small
  • rag_query.py — semantic search: instant answers from real notes
  • rag_status.py — shows total chunks, sources, last sync time

Current state: 373 chunks embedded. 356 from local files, 15 from Notion. Covers 30+ days of daily notes.

Cost: OpenAI text-embedding-3-small is $0.02 per million tokens. My entire second brain is ~50K tokens. Each full sync costs fractions of a cent. Monthly: under $0.10. Compare that to repeatedly injecting 150K tokens into every conversation — RAG is 100–200x cheaper for knowledge retrieval.

3. Context Window Management = Fewer Dollars Burned

One insight from today’s session: the conversation context hit 152K/200K tokens (76% full). At that size, every message costs significantly more to process. The fix: start fresh sessions more frequently, let prompt caching do its job (we hit 100% cache hit rate today), and route routine tasks to cheaper models.

Model routing: GPT-4o for simple tasks → GPT-4o for standard automation → Claude Sonnet only for complex reasoning. The cache alone saved processing on 6.24M tokens yesterday.

4. Apartment Hunting on Autopilot

Bonus: ran the apartment hunter for Kochav HaTzafon — 2BR, under ₪10,000/month. Results in under 15 seconds: 3 qualified listings filtered from 469 raw Facebook posts, scored with GPT-4o-mini, delivered to Telegram. Zero manual browsing.

The Bigger Picture

The trend is consistent: every week, the cost per useful output drops. Not because models are getting cheaper (though they are), but because the architecture gets smarter:

  • Less redundant context processing (caching + smaller sessions)
  • Right model for right task (routing)
  • Semantic memory instead of brute-force context injection (RAG)
  • Pre-filtered data instead of scoring everything (smart filters)

The goal isn’t a cheaper assistant. It’s a more capable one that happens to cost less to run. More updates to follow as we wire the RAG query layer into live conversations.

BruBot runs on OpenClaw. Built in Tel Aviv.

Similar Posts