The Prompt That Crossed Two Organizations — And Got Sharper Each Time
How a product executive's pressure-testing framework traveled to systems engineering, and what happened when we pointed it at a real AWS workflow.
There's a quiet revolution happening in how smart teams use AI — and it has nothing to do with the model. It has everything to do with the instructions.
A few weeks ago I borrowed a prompt template from a product owner at a large enterprise. He had built it in ChatGPT to do something powerful: give his leadership team a private space to pressure-test ideas before they ever entered a room. When executives were developing roadmaps, he'd run them through the model first — surfacing assumptions, stress-testing the logic, anticipating the hard questions a CFO or GM might raise. The result was a win on both sides of the table. The CEO could arrive at conversations with a sharper, more fully-formed point of view. And the product manager got to execute against a plan that had already survived serious scrutiny — no half-baked pivots, no surprises mid-flight.
Think of it less as critique and more as a rehearsal room. The tool doesn't challenge people — it challenges ideas, privately, before the stakes are high.
I took the same framework and ran it in Claude — the model we use at Solo. It worked just as well. Which raises something worth sitting with: the framework didn't just travel across organizations and domains. It traveled across AI models entirely. That's the tell. When the same set of instructions produces sharp, useful output regardless of which model is running them, the instructions are the asset. The model is increasingly the commodity.
I read those instructions and thought: this exact mental model applies to systems architecture.
So I adapted them. Same seven-step skeleton — identify the thesis, stress-test the portfolio balance, expose assumptions, map risk concentration, name the opportunity costs, simulate the leadership challenge, propose alternative shapes. I changed the vocabulary and the lens. Instead of asking what will Finance push back on, I asked what will the CEO and engineering team challenge. Instead of scoring for revenue potential, I scored for cost and time savings. The goal shifted from sharpening an executive's boardroom instincts to sharpening an engineer's thinking in peer and leadership conversations.
Then I pointed it at something real: the workflow our team at Solo uses to create new EventBridge scheduled rules in AWS.
What the workflow looked like
The process isn't complicated on its face. You take a YAML template, swap in five dynamic variables, insert the block into a CloudFormation file, open a PR, get infra review, run a change set in the AWS console, verify, execute, merge. Clear enough. Solid IaC discipline — no console drift, changes tracked in git, peer review required.
What happened next is what makes this worth writing about.
What Claude found
The thesis was absent. Claude's first move was to name this: defensive / constraint-driven, no clear thesis. The process enforces the right principle — infrastructure as code over console changes — but it's encoded as a manual human checklist rather than a system with guardrails. The implied thesis should be "infrastructure changes are safe, auditable, and low-friction." Claude's verdict: it's only partially achieving that.
The portfolio is completely unbalanced. Claude broke the workflow down across four dimensions: IaC discipline (present), automation and tooling (absent), validation guardrails (absent), developer autonomy (weak). Its summary was blunt: all weight is on human process, zero investment in tooling that would make this self-service and safe simultaneously.
The assumptions don't hold up. Claude identified four things that must be true for this process to succeed — and challenged each one:
"Engineers will follow the checklist correctly every time." There's no validation that catches a malformed cron expression, wrong snake_case, or a misaligned variable before it hits CloudFormation.
"Infra team review is a meaningful gate." Unclear. If it's checking syntactic correctness, a linter does this faster and more reliably. If it's checking strategic intent, that's not documented anywhere.
"The change set review step prevents mistakes." Partially — but it requires the engineer to know what to look for. No checklist defines what "looks good" actually means.
"Console changes never happen." The last line of the process says never change the rule from the AWS Console — but there's no enforcement mechanism. That's policy, not a guardrail.
The risk concentration is real. One engineer executing the process incorrectly causes a production scheduling outage or a silent missed execution. The template has six-plus Fn::ImportValue calls — a single upstream stack name change silently breaks every rule, with no cross-reference validation documented. And the process assumes the engineer simultaneously knows valid cron syntax, CloudFormation change set semantics, ECS task override structure, and rake task naming conventions. That's a high knowledge bar with no scaffolding.
If someone misses it — especially under time pressure — they introduce configuration drift that CloudFormation won't detect on the next deploy. The instruction exists. The enforcement doesn't.
The leadership challenge simulation
Claude predicted exactly the questions that would land hardest in an executive or peer review:
"How many engineer-hours does this take per rule addition, end to end? What's the error rate? Have we had production scheduling failures from this process?"
Probably no clean answers. That's the problem.
"If a scheduled job silently fails to be created or runs at the wrong time, what's the business impact?"
The process has no alerting or confirmation that a newly added rule is actually firing. Execution success is not the same as operational correctness.
"Why do I need infra team review for a cron job? Why can't I validate this locally? Why is there no test environment path?"
Claude's assessment: these are legitimate objections. The current process treats every engineer as a potential misconfigurer rather than building systems that make misconfiguration hard.
Three alternative shapes
Claude proposed three different strategic directions, each with honest tradeoffs:
What this is really about
The genealogy of this critique is what I keep coming back to. A product owner at an enterprise company built a framework in ChatGPT to make product leaders sharper. I adapted it for Claude to make systems engineers sharper. The seven-step skeleton traveled across two organizations, two domains, two AI models, and two completely different problems — and produced something genuinely useful every time.
That last part matters more than it might seem. We're entering a moment where the major AI models are converging in capability. The choice between them is increasingly a matter of workflow preference, not raw power. What doesn't transfer automatically — what has to be deliberately designed — is how you instruct them. The same prompt that works in ChatGPT works in Claude. The same framework that sharpens a product roadmap sharpens an engineering workflow. The instructions are the portable, reusable, compounding asset. The model is the infrastructure underneath.
We spend a lot of time evaluating which AI model to use and almost no time designing how we instruct it. The difference between an AI that validates your thinking and one that challenges it isn't the model version — it's the instruction set. One framing decision, encoded in a project's system prompt, shifts the output from agreeable to adversarial, from a mirror to a pressure test.
The insight the enterprise product owner had — that you can force structured, sequential reasoning by encoding a multi-step framework as the operating instruction — turns out to be domain-agnostic and model-agnostic. The same architecture works on roadmaps, on engineering workflows, on financial models, on hiring processes. You change the vocabulary. The sharpness is the point.
The prompt that lives in our Claude project now means any engineer can walk in with a workflow, a design doc, or an architectural decision and get back something that will make them think harder — not feel better.
That's the unlock. And it cost nothing but the willingness to borrow a smart idea from someone doing a completely different job, on a completely different platform, solving a completely different problem.
The best prompts, it turns out, travel well.
Want to build your own pressure-testing project? The pattern is straightforward: pick an adversarial advisor persona, write a multi-step reasoning framework that forces each analytical lens to run in sequence, and explicitly ban validation as a default behavior. It works in Claude. It works in ChatGPT. The framework above has seven steps — but what makes it work isn't the number, and it isn't the model. It's the instruction not to let weak reasoning slide.
No comments:
Post a Comment