Wednesday, April 08, 2026

The Prompt That Crossed Two Organizations

Engineering × AI

The Prompt That Crossed Two Organizations — And Got Sharper Each Time

How a product executive's pressure-testing framework traveled to systems engineering, and what happened when we pointed it at a real AWS workflow.

By Thushara Wijeratna · with Scott Johnson & Ryan Polley

There's a quiet revolution happening in how smart teams use AI — and it has nothing to do with the model. It has everything to do with the instructions.

A few weeks ago I borrowed a prompt template from a product owner at a large enterprise. He had built it in ChatGPT to do something powerful: give his leadership team a private space to pressure-test ideas before they ever entered a room. When executives were developing roadmaps, he'd run them through the model first — surfacing assumptions, stress-testing the logic, anticipating the hard questions a CFO or GM might raise. The result was a win on both sides of the table. The CEO could arrive at conversations with a sharper, more fully-formed point of view. And the product manager got to execute against a plan that had already survived serious scrutiny — no half-baked pivots, no surprises mid-flight.

Think of it less as critique and more as a rehearsal room. The tool doesn't challenge people — it challenges ideas, privately, before the stakes are high.

I took the same framework and ran it in Claude — the model we use at Solo. It worked just as well. Which raises something worth sitting with: the framework didn't just travel across organizations and domains. It traveled across AI models entirely. That's the tell. When the same set of instructions produces sharp, useful output regardless of which model is running them, the instructions are the asset. The model is increasingly the commodity.

I read those instructions and thought: this exact mental model applies to systems architecture.

So I adapted them. Same seven-step skeleton — identify the thesis, stress-test the portfolio balance, expose assumptions, map risk concentration, name the opportunity costs, simulate the leadership challenge, propose alternative shapes. I changed the vocabulary and the lens. Instead of asking what will Finance push back on, I asked what will the CEO and engineering team challenge. Instead of scoring for revenue potential, I scored for cost and time savings. The goal shifted from sharpening an executive's boardroom instincts to sharpening an engineer's thinking in peer and leadership conversations.

Then I pointed it at something real: the workflow our team at Solo uses to create new EventBridge scheduled rules in AWS.

— —

What the workflow looked like

The process isn't complicated on its face. You take a YAML template, swap in five dynamic variables, insert the block into a CloudFormation file, open a PR, get infra review, run a change set in the AWS console, verify, execute, merge. Clear enough. Solid IaC discipline — no console drift, changes tracked in git, peer review required.

What happened next is what makes this worth writing about.

— —

What Claude found

Step 1 — Strategic thesis

The thesis was absent. Claude's first move was to name this: defensive / constraint-driven, no clear thesis. The process enforces the right principle — infrastructure as code over console changes — but it's encoded as a manual human checklist rather than a system with guardrails. The implied thesis should be "infrastructure changes are safe, auditable, and low-friction." Claude's verdict: it's only partially achieving that.

Step 2 — Portfolio balance

The portfolio is completely unbalanced. Claude broke the workflow down across four dimensions: IaC discipline (present), automation and tooling (absent), validation guardrails (absent), developer autonomy (weak). Its summary was blunt: all weight is on human process, zero investment in tooling that would make this self-service and safe simultaneously.

Step 3 — Assumption exposure

The assumptions don't hold up. Claude identified four things that must be true for this process to succeed — and challenged each one:

"Engineers will follow the checklist correctly every time." There's no validation that catches a malformed cron expression, wrong snake_case, or a misaligned variable before it hits CloudFormation.

"Infra team review is a meaningful gate." Unclear. If it's checking syntactic correctness, a linter does this faster and more reliably. If it's checking strategic intent, that's not documented anywhere.

"The change set review step prevents mistakes." Partially — but it requires the engineer to know what to look for. No checklist defines what "looks good" actually means.

"Console changes never happen." The last line of the process says never change the rule from the AWS Console — but there's no enforcement mechanism. That's policy, not a guardrail.

Step 4 — Risk concentration

The risk concentration is real. One engineer executing the process incorrectly causes a production scheduling outage or a silent missed execution. The template has six-plus Fn::ImportValue calls — a single upstream stack name change silently breaks every rule, with no cross-reference validation documented. And the process assumes the engineer simultaneously knows valid cron syntax, CloudFormation change set semantics, ECS task override structure, and rake task naming conventions. That's a high knowledge bar with no scaffolding.

The most underappreciated risk lives in the last line: "Never change the rule from the AWS Console." A critical constraint buried where engineers are least likely to retain it.

If someone misses it — especially under time pressure — they introduce configuration drift that CloudFormation won't detect on the next deploy. The instruction exists. The enforcement doesn't.

— —

The leadership challenge simulation

Claude predicted exactly the questions that would land hardest in an executive or peer review:

CEO / Engineering lead

"How many engineer-hours does this take per rule addition, end to end? What's the error rate? Have we had production scheduling failures from this process?"

Probably no clean answers. That's the problem.

Finance

"If a scheduled job silently fails to be created or runs at the wrong time, what's the business impact?"

The process has no alerting or confirmation that a newly added rule is actually firing. Execution success is not the same as operational correctness.

Engineering

"Why do I need infra team review for a cron job? Why can't I validate this locally? Why is there no test environment path?"

Claude's assessment: these are legitimate objections. The current process treats every engineer as a potential misconfigurer rather than building systems that make misconfiguration hard.

— —

Three alternative shapes

Claude proposed three different strategic directions, each with honest tradeoffs:

Option A — Recommended near-term

Script-first

Build a small CLI that prompts for inputs, validates cron syntax, generates the YAML block, and inserts it correctly into the file. This eliminates the entire class of template variable substitution errors and removes the knowledge burden from the engineer.

Eliminates substitution error class entirely One-time build and maintenance investment

Option B

Separate stack per rule

Instead of one monolithic YAML file, each rule gets its own CloudFormation stack. Engineers own their rule's lifecycle. Merge conflicts disappear. Blast radius is isolated.

Eliminates merge conflicts, isolates blast radius Stack proliferation requires naming discipline

Option C — Longer horizon

Migrate to EventBridge Scheduler + CDK / Terraform

Replace CloudFormation-managed EventBridge Rules with the purpose-built newer service, managed through CDK or Terraform modules. Better DX, built-in retry policies, templated constructs reduce copy-paste risk significantly.

Better DX, built-in retries, less copy-paste risk Migration cost, team upskilling, short-term disruption

Claude's bottom line: This process enforces the right principle with the wrong mechanism. The risk isn't that engineers are careless — it's that the process provides no structural resistance to errors. A linter, a generator script, and a validation step in CI would eliminate the majority of failure modes at low cost. The highest-leverage immediate action: a script that generates the YAML block from inputs and validates cron syntax before the PR is opened. Everything else can wait.

— —

What this is really about

The genealogy of this critique is what I keep coming back to. A product owner at an enterprise company built a framework in ChatGPT to make product leaders sharper. I adapted it for Claude to make systems engineers sharper. The seven-step skeleton traveled across two organizations, two domains, two AI models, and two completely different problems — and produced something genuinely useful every time.

That last part matters more than it might seem. We're entering a moment where the major AI models are converging in capability. The choice between them is increasingly a matter of workflow preference, not raw power. What doesn't transfer automatically — what has to be deliberately designed — is how you instruct them. The same prompt that works in ChatGPT works in Claude. The same framework that sharpens a product roadmap sharpens an engineering workflow. The instructions are the portable, reusable, compounding asset. The model is the infrastructure underneath.

We spend a lot of time evaluating which AI model to use and almost no time designing how we instruct it. The difference between an AI that validates your thinking and one that challenges it isn't the model version — it's the instruction set. One framing decision, encoded in a project's system prompt, shifts the output from agreeable to adversarial, from a mirror to a pressure test.

The insight the enterprise product owner had — that you can force structured, sequential reasoning by encoding a multi-step framework as the operating instruction — turns out to be domain-agnostic and model-agnostic. The same architecture works on roadmaps, on engineering workflows, on financial models, on hiring processes. You change the vocabulary. The sharpness is the point.

The prompt that lives in our Claude project now means any engineer can walk in with a workflow, a design doc, or an architectural decision and get back something that will make them think harder — not feel better.

That's the unlock. And it cost nothing but the willingness to borrow a smart idea from someone doing a completely different job, on a completely different platform, solving a completely different problem.

The best prompts, it turns out, travel well.

Want to build your own pressure-testing project? The pattern is straightforward: pick an adversarial advisor persona, write a multi-step reasoning framework that forces each analytical lens to run in sequence, and explicitly ban validation as a default behavior. It works in Claude. It works in ChatGPT. The framework above has seven steps — but what makes it work isn't the number, and it isn't the model. It's the instruction not to let weak reasoning slide.

Thursday, March 26, 2026

How We Migrated Sidekiq's Redis Without Losing a Single Job

Infrastructure · Redis · Sidekiq

How We Migrated Sidekiq's Redis Without Losing a Single Job (and Without Listening to AI)

Solo Engineering Team · March 2026 · 8 min read

We moved our Sidekiq backend from Redis Enterprise to AWS ElastiCache. The AI tools recommended a careful, expensive approach. We did something simpler — and it worked perfectly.

The Setup

Our app runs Sidekiq workers on ECS. Each process connects to Redis on startup to read and process jobs. We were moving from Redis Enterprise to ElastiCache — different host, different connection string, same protocol.

New jobs would start going to the new Redis as soon as we deployed. But existing jobs queued in the old Redis? They'd be orphaned the moment every worker switched over.

What the AI Tools Said

We asked around — Claude, ChatGPT, Gemini, Grok. They all landed in roughly the same place:

You should deploy a separate environment connected to the old Redis. Let it drain the queue over time, then decommission.

It's not wrong. But it's heavy. That approach meant new ECS task definitions, environment variable management across two sets of infra, coordinating the decommission, and extra cost while two clusters run in parallel.

When we pushed back, one tool offered an alternative: run two Sidekiq processes per Docker container — one pointed at old Redis, one at new. That would have required changes to CloudFormation templates, process supervision config inside the container, and careful cleanup afterward. Trading one complex migration for another.

But they missed something important: Sidekiq's backing store is completely external to the process. A job scheduled on Redis Enterprise doesn't belong to any particular Sidekiq process — it just sits there until a worker with a connection to that Redis comes along. The worker is stateless.

So the "debugging nightmare" scenario the AI tools described... wouldn't actually happen.

The Actual Solution

Our team came up with something much simpler. In config/initializers/sidekiq.rb, at startup, each Sidekiq process decides which Redis to connect to. We added one line:

config/initializers/sidekiq.rb — the one-liner

# Coin toss at startup — connects this process to one Redis for its entire lifetime
redis_url = rand < 0.5 ? LYMO_SIDEKIQ_NEW_REDIS_URL : LYMO_SIDEKIQ_OLD_REDIS_URL

That's it. On startup, each worker tosses a coin. Heads → new ElastiCache. Tails → old Redis Enterprise.

The result: roughly half the cluster continued draining the old queue, while the other half processed new jobs on ElastiCache. No new infra. No task definition changes. No separate environment to coordinate.

We also pointed all job producers (the code that enqueues jobs) at the new Redis immediately. So new work only ever went to ElastiCache. The old Redis just needed to drain.

This is where Sidekiq's initializer structure becomes the key enabler. Each configure_server and configure_client are can be wired seperately where the server (one that reads) uses the redis_url resolved at startup:

config/initializers/sidekiq.rb — full initializer

redis_url = rand < 0.5 ? LYMO_SIDEKIQ_NEW_REDIS_URL : LYMO_SIDEKIQ_OLD_REDIS_URL

Sidekiq.configure_server do |config|
  config.redis = { url: redis_url }
end

Sidekiq.configure_client do |config|
  config.redis = { url: new_redis_url }
end

One coin toss. One URL to pull from. That process reads and from the same Redis for its entire lifetime.

The clients (that push jobs) will always use the new url, and the reads would be split between the old and new url. In time, the old queue drains as it receives no further jobs. The old Redis processes were naturally left behind to drain, and as they cycled out, the cluster fully converged on the new setup with no intervention required.

How It Went

It worked exactly as expected. Within a day, roughly 90% of the old queue had drained naturally. Workers reading from old Redis gradually found less and less work, while ElastiCache handled all the new throughput.

The remaining jobs were a different story: scheduled jobs. These live in Sidekiq's sorted set and don't get picked up until their execution time arrives — which could be hours away. Waiting wasn't ideal, so we wrote a small script to move them from the old Redis to the new one manually. A few lines to iterate the scheduled (and retry) set, re-enqueue on ElastiCache, and delete from old Redis. Clean cutover.

Once that was done, we deployed the cleanup — removed the conditional and all references to the old Redis connection. Four lines of code deleted. Done.

Oh, and while all of this was happening? The rest of the team made a dozen normal deployments — which restarted servers, reshuffled which Redis each process landed on, and generally did everything the AI tools said would cause a debugging nightmare. Nothing broke. No jobs lost. The doom and gloom theories were disproven in the most practical way possible: by live testing.

Why the AI Advice Missed the Mark

The AI tools were technically cautious but operationally naive. They modeled the problem as "jobs are tied to a running process" — which isn't how Sidekiq works. Redis is the source of truth, not the worker. The worker is stateless.

They also defaulted to the safest, most conservative architecture: full environment isolation. That's sensible for high-stakes migrations. But for a queue drain, it's significant overengineering.

The human insight — the DB is external, the workers are stateless, so we can split them probabilistically — is the kind of lateral thinking that comes from actually understanding the system rather than pattern-matching to a template.

— ✦ —

Takeaways

01
Sidekiq workers are stateless. Redis is the state. This gives you more migration flexibility than you might think.
02
Probabilistic splits are underrated. You don't always need clean cutoffs. A coin toss at startup is simple, observable, and reversible.
03
AI tools are good at safe answers, not always good at efficient ones. They'll often recommend the conservative solution even when a simpler one exists. Treat their output as a starting point, not a final answer.
04
The cleanup should be as simple as the migration. If your migration leaves behind complex infra, you've done too much. Ours cleaned up with four deleted lines.

Sunday, March 15, 2026

Dead Code Is a Cognitive Tax — Here's How AI Helps You Stop Paying It

Posted to Engineering · [Your Name] · [Date]

Every engineer knows the feeling. You open an unfamiliar part of the codebase, and you're immediately staring down a tangle of services, workers, models, and task entries — none of which come with a label saying "still matters" or "abandoned in 2023." You read the code carefully, try to trace the call graph, maybe even grep for usages — and only after 30 minutes do you realize: this thing hasn't run in production for over a year.

That tax on your attention has a name: cognitive load. And dead code is one of its most insidious sources.

What Is Cognitive Load in a Codebase?

Cognitive load, in the context of software engineering, is the total mental effort required to understand a system well enough to work in it safely. Every class, method, model, and background job you encounter is a unit of context you have to hold in your head.

The problem is that your brain doesn't automatically know which of those units are live and which are ghosts. If an EstimateWorker class exists in your repo, you have to assume it matters — until you prove otherwise. That proof takes time, attention, and often a distracting detour away from the actual work you sat down to do.

Dead code doesn't just waste disk space. It actively misleads you.

A Real-World Example: The Estimation Pipeline Cleanup

Recently, our team completed a cleanup effort across seven pull requests targeting a legacy estimation infrastructure — a suite of services originally built around Prophet forecasts and a Clair analysis pipeline — that had gone completely dark since late 2023.

Here's what was still sitting in the codebase, doing nothing:

EstimateService — fetched a CSV over HTTP, upserted records into the database, and refreshed an estimation cache. Silent for months.
EstimateWorker — a Sidekiq background job that uploaded files to S3, triggered the estimation flow, and posted Slack notifications. Long dead.
Estimation::Prophet::DownloadWorker — downloaded forecast CSVs from S3 and upserted them into a Prophet table. Never called.
Estimators::ClairAnalysis — computed hourly analysis records for a brief window in late 2023, then stopped.
ClairAnalysis model and its backing database table — zero writes since the pipeline went quiet.
Three SwitchBoard dispatch entries — events_collect_for_next_week, generate_weekly_user_report, estimate_v2 — all orphaned task names in a routing map.

Any engineer — or AI assistant — reading this codebase would reasonably assume all of the above was active production infrastructure. None of it was.

The Numbers

Pull Requests

Files Changed

943

Lines Deleted

−816

Net Lines Removed

PR	Branch	+Added	−Deleted	Files
#1	cleanup-tasks	13	16	2
#2	cleanup-unused-estimate	0	74	4
#3	remove-clair-analysis	0	314	2
#4	remove-prophet	0	210	5
#5	remove-clair-analysis-model	20	57	3
#6	rename-clair-v2s	94	68	13
#7	remove-estimate-unused	0	204	2
Total		127	943	31

The 127 additions are almost entirely the rename PR (#6) — migrations, updated references, and renamed specs. Every other PR was pure deletion.

The Cognitive Impact of the Cleanup

Cleaner model surface. Once EstimateService, EstimateWorker, and ClairAnalysis were gone, the remaining models — Clair, ClairDailyInterimResult, ClairSetting — actually reflected how the system works today.

Naming that signals intent. ClairV2 implies a versioning scheme. ClairDailyInterimResult tells you exactly what the thing is and why it exists.

A smaller SwitchBoard dispatch map. Removing the three orphaned entries made the dispatch map honest again.

A shorter test suite that still covers everything that matters. Several spec files covering deleted code were removed. The test suite got faster without losing any meaningful coverage.

Where AI Fits In: Finding Dead Code You Can't See

Here's the uncomfortable truth about dead code: it's often invisible to the people closest to it. If you wrote EstimateWorker two years ago and the team that decommissioned the upstream service never filed a ticket, you might not even know it's dead. The code looks fine. The tests pass. Nothing alerts you.

A Telling Real-World Example: Claude Gets Confused, Then Catches Itself

We recently asked Claude to generate a flow diagram of our pay guarantee process. Claude produced a diagram that looked plausible — tracing through services, models, and workers in a way that made logical sense.

The problem? Part of that diagram was wrong — because Claude had incorporated a module that was no longer active into its understanding of the flow. The dead code was so well-structured and apparently coherent that the AI read it as live infrastructure and wove it into the diagram without hesitation.

But here's what makes this story instructive rather than just cautionary: When an engineer removed this hopefully last bit of dead code, Claude immediately realized that the diagram she drew earlier relied on this bad signal, revised its understanding, and corrected the diagram.

That sequence — confidently wrong, then self-correcting — is a useful frame for thinking about AI and dead code. It fooled the AI for the same reason it fools engineers: it looks like it belongs.

What AI Can Do

Tracing call graphs at scale. AI can trace the full call graph of a function or class across an entire monorepo — answering not just with direct callers, but with the absence of callers.

Cross-referencing runtime signals with static code. When connected to observability data — logs, APM traces, queue metrics — an AI can compare what the code says it does with what actually runs in production.

Flagging stale patterns. Dead code has fingerprints: models with no recent migrations, task names absent from any scheduler config, service classes with no callers outside their own spec files.

Drafting cleanup PRs. Once dead code is identified, AI can help draft the actual removal — proposing what to delete, what to rename, and what specs to clean up.

What AI Can't Do (Yet)

AI isn't a replacement for engineering judgment. A worker might be "dead" in CI but still referenced by a cron job in an ops runbook nobody's touched in three years.

The right model is AI as a scout, engineer as the decision-maker. AI surfaces candidates. Engineers verify, contextualise, and own the deletion.

Making Dead Code Cleanup a Habit

Timestamp your decommissions. When you turn off a pipeline, leave a comment in the code with the date.
Review your task dispatch maps regularly. A quarterly review catches orphaned entries before they fossilise.
Use AI during onboarding and code review. AI tools can help new engineers quickly validate whether something is live — and surface it for cleanup if it isn't.
Treat deletion as a first-class deliverable. 816 lines removed is a meaningful engineering contribution. Make it visible in sprint planning, changelogs, and retros.

Conclusion

Large codebases accumulate cognitive debt quietly, continuously, and with compounding interest. Dead code is one of the most expensive line items: it misleads engineers, bloats test suites, and turns routine code reading into archaeology.

As we saw first-hand, it even misleads AI. Claude confidently incorporated a dead module into a flow diagram of our pay guarantee process — because the code looked live. That moment of confusion, and the self-correction that followed, is a perfect metaphor for where we are with AI-assisted engineering today: powerful, promising, and most effective when paired with good runtime context and human judgment.

The goal isn't a perfect codebase. It's a codebase where the code you're reading is the code that's actually running. That's a goal worth shipping toward.

Thursday, February 26, 2026

When AI Sounds Right But Isn't: A Sidekiq Story

I recently ran into a Redis migration problem with Sidekiq. I needed to point our workers at a new Redis instance while safely draining jobs still sitting in the old one. I turned to Claude for advice — and what followed is a good reminder of why GenAI output should always be treated as a starting point, not a final answer.

Step 1: My Initial Idea (and Why Claude Correctly Pushed Back)

My first instinct was a "coin toss" approach: configure the Sidekiq server to point to the old Redis with 50% probability and the new Redis otherwise, while the client always pushes to the new Redis. Once the old queue drained, I'd do a clean cutover.

Claude's response here was actually good:

Claude: The coin toss approach has some real problems... Non-deterministic behavior is hard to reason about in production. You lose retry guarantees. A job that fails while a worker is connected to the old Redis will be retried there — but if on the next attempt the worker happens to connect to the new Redis, it won't see that job. The job could get stuck or lost. Sidekiq server connects to one Redis at startup, not per-job...

Fair enough. The critique was accurate. Claude then suggested a cleaner alternative: use Sidekiq 7's config.capsules to run multiple Redis connections within a single process — one capsule for the new Redis, one to drain the old.

Step 2: A Confident, Detailed, Completely Wrong Answer

I was on Sidekiq Pro 7.3.9, so I asked Claude to show me how to configure capsules for this use case. It gave me this:

Sidekiq.configure_server do |config|
  # New Redis - primary capsule (default)
  config.redis = { url: new_redis_url }

  # Old Redis - drain capsule
  config.capsule("legacy") do |cap|
    cap.redis = { url: old_redis_url }
    cap.concurrency = 5
    cap.queues = %w[default mailers critical]
  end
end

Sidekiq.configure_client do |config|
  config.redis = { url: new_redis_url }
end

The response was detailed. It used the right vocabulary. It even included helpful caveats about thread pool sizing, checking the retry and scheduled sets before teardown, and queue enumeration via redis-cli SMEMBERS queues. It felt authoritative.

There was just one problem: it was wrong.

Step 3: Thirty Seconds in the Docs

Before implementing anything, I did a quick check of the official Sidekiq documentation. Here's what it says, plainly:

Sidekiq Docs: "There is still one iron-clad rule: a Sidekiq process only executes jobs from one Redis instance; all Capsules within a process must use the same Redis instance. If you want to process jobs from two separate Redis instances, you need to start two separate Sidekiq processes."

Capsules are for organizing queues and concurrency within a single Redis instance. They cannot span multiple Redis instances. The entire premise of Claude's solution was invalid.

When I pointed this out, Claude acknowledged it immediately:

Claude: You're right, I was wrong — capsules cannot span multiple Redis instances. They're for organizing queues/concurrency within a single Redis. So for your drain scenario, you actually need two separate Sidekiq processes with different config files.

No pushback. No hedging. Just a clean admission that the confident, detailed answer was incorrect.

What Makes This Failure Mode Dangerous

This isn't a story about AI being obviously broken. It's about AI being convincingly wrong — and that's a harder problem to guard against.

A few things made this particularly easy to fall for:

The answer was structurally sound. Capsules are a real Sidekiq 7 feature. The code was syntactically valid Ruby. The caveats about concurrency and retry sets were genuinely useful. Only the core assumption — that capsules can target different Redis instances — was wrong.
The fluency signals trust. When an answer uses the right terminology, references the right version numbers, and anticipates edge cases, it reads as expert. That fluency is a product of training on large amounts of text, not of verified understanding.
The model doesn't know what it doesn't know. Claude didn't say "I'm not certain about the multi-Redis constraint — check the docs." It presented the solution as if it were established fact.

A Simple Rule of Thumb

If you wouldn't ship code based solely on a Stack Overflow answer from 2019 without reading the docs, don't ship code based solely on a GenAI answer either. The bar should be the same — or higher, because at least the Stack Overflow answer has upvotes, comments, and a date stamp.

GenAI is genuinely useful for orientation: understanding an unfamiliar API surface, exploring options, getting unstuck. But any answer that involves a specific documented behavior — especially version-specific constraints — needs at least one authoritative source check before you act on it.

In this case, thirty seconds in the Sidekiq docs saved what could have been hours of debugging a fundamentally broken architecture. That's a pretty good return on thirty seconds.

The actual solution, if you're curious: two separate Sidekiq processes with separate config files, each pointing at a different Redis instance. One processes new work, one drains the old queues. When the old queue, retry set, and scheduled set are all empty, shut the old process down.