Three Agents, One Cron Nightmare: Building the T3 Coordination System

It’s 21:47 on a Sunday night in Tel Aviv, and I just got my first clean evening brief from Tomer — the third agent in a system I’ve been wiring together for the past fourteen hours. Three agents. Three jobs. One orchestration layer that finally doesn’t break.

Let me tell you what it actually took to get here.

The T3 Architecture: Why Three Agents Instead of One

Three Agents, One Cron Nightmare: Building the T3 Coordination System

When I started building this, I made the mistake everyone makes: one monolithic agent that does everything. Research, health tracking, portfolio updates, system monitoring — all crammed into a single script with a single cron job.

It lasted about three days before becoming unmaintainable.

So I split it. The T3 system now has three specialized agents:

Tim — Research & Intelligence. Runs at 07:10, pulls news via Brave Search API, summarizes through Claude, sends me what matters before I finish my coffee.
Tony — Health & Finance. Runs at 07:05, grabs my Leopold portfolio prices from Yahoo Finance, pulls Strava data, gives me the numbers I care about.
Tomer — Ops & Orchestration. Runs at 21:00, audits the other agents, checks GitHub for releases I’m tracking, sends the evening brief.

The logic is simple: agents that run at different times, with different responsibilities, talking to different Telegram chats. What’s not simple is making them aware of each other.

What Broke: The Cron Audit Incident

Here’s a specific failure that cost me two hours today.

Tomer’s job is to verify that Tim and Tony actually ran. I built a health check that looks for log files with today’s timestamp. Simple, right?

Except I wrote the date check like this:

if log_date == datetime.now().strftime("%Y-%m-%d")

The problem: Tomer runs at 21:00. Tim and Tony run at 07:05 and 07:10. On the server, the logs were being written with UTC timestamps, but my comparison was using local Tel Aviv time. For fourteen hours a day, the health check worked. Then it didn’t.

The fix was embarrassingly simple — normalize everything to UTC. But finding it meant tailing logs, adding debug prints, and questioning my life choices while my Telegram stayed silent.

Coordination in Practice: The Handoff Problem

The harder design problem isn’t scheduling — it’s state. What happens when Tony fails to fetch portfolio data? Does Tomer know? Does Tomer care?

Right now, I’m using a lightweight approach: each agent writes a JSON status file when it completes. Tomer reads all three status files (including its own from yesterday) and builds a health report. If a file is missing or stale, it flags it.

This isn’t elegant. It’s not event-driven. There’s no message queue, no pub/sub, no fancy orchestration framework. It’s files on disk and cron jobs that either run or don’t.

But it works at 3 AM when I’m asleep. That’s the bar.

What I Actually Learned Today

Building an automated life isn’t about the automation — it’s about the recovery paths. Every agent needs to fail gracefully. Every orchestration layer needs to know what “healthy” looks like. And every system you build at night will break in ways you didn’t test for.

The T3 system isn’t done. I still need proper alerting when agents go silent for more than a day. I need better retry logic for Yahoo Finance timeouts. I need to figure out what happens when Claude’s API is slow and Tim’s summary runs past Tony’s health check window.

But tonight, I got a clean brief at 21:00. Three agents reported in. The system is running.

The question I’m sitting with: at what point does monitoring your automation become its own full-time job? And how do you automate that?