Thursday, June 4, 2026
I pay for the frontier. The model I was using was Claude Opus 4.8 — released May 28, 2026, 1M-token context, the flagship Anthropic puts on stage and the most capable model an ordinary customer can actually buy. It sits exactly one rung below Mythos, the model Anthropic decided was too dangerous to hand out freely: Mythos-class models found thousands of zero-day vulnerabilities autonomously — including decades-old bugs in OpenBSD — so the company gated them behind Project Glasswing and a hand-picked set of partners. Opus 4.8 is the consumer-grade taste of that lineage. The "almost Mythos" tier.
And it still spent twenty minutes building me a meticulously-researched, well-organized, completely wrong answer about my own codebase.
I want to write this one down, because the failure mode is more dangerous than a model that's obviously dumb. A model that's obviously dumb you don't trust. A model that's fluent, thorough, and wrong is the one that gets your bad code deployed.
The setup
I had a vague memory that our users table carried a leftover password-reset field we no longer used. I asked the assistant to confirm and clean it up.
It found a password_reset_token reference sitting in the model's ignored_columns and ran with it. First answer: write a migration to drop the column, and while we're at it, rip out the "dead" fallback code in the controller that still referenced it.
I stopped it. I said: we keep the web password-reset flow, but Rails 8 does it without storing a token in the table, so we just need to drop the column.
Where it went off the rails
This is the part worth studying. The assistant did what looks, on the surface, like exactly the diligence you'd want. It went spelunking through git history. It found the commit that dropped the column. It found a sibling commit that disabled a code path. It checked which commits were ancestors of main. It read diffs. It produced a tidy, confident writeup with file-and-line citations and a clear conclusion:
The "new mechanism" was never actually implemented.
has_secure_passworddoes not providepassword_reset_tokenorfind_by_password_reset_token. The web reset path inmainis broken and would raiseNoMethodErrorif hit. Here's the fix: addgenerates_token_for :password_reset, rewrite these two methods…
It even offered to restore the test coverage. It was helpful. It was organized. It cited everything.
It was also built on a single load-bearing claim it never checked: that has_secure_password doesn't generate those methods.
That claim is false. In Rails 7.1+, has_secure_password (with its default reset_token: true) auto-defines exactly those methods and wires up a generates_token_for :password_reset. The original engineer who dropped the column had been right. The commit message even said so. The model read that commit message, decided it was based on a "false premise," and overrode it with its own recollection of how Rails works.
The thing that actually settled it
I told it, flatly: "password reset works in main."
Then — only then — it did the one thing it should have done in the first minute: it ran the code.
$ bin/rails runner 'u = User.new; puts u.respond_to?(:password_reset_token)'
true
$ ... User.respond_to?(:find_by_password_reset_token)
true
$ ... u.generate_token_for(:password_reset)
eyJfcmFpbHMiOnsibWVzc2FnZSI6... # a real signed token, 15-min expiry
All true. All working. The token mints fine. The web flow has been working the whole time. The column was correctly removed weeks ago. There was nothing to do.
The verification took about two minutes and was available from the very beginning. It would have pre-empted the entire wrong narrative. The model had every tool it needed to check itself, and instead it reasoned its way to a confident falsehood and only reached for the ground truth after a human insisted.
Why this is the dangerous kind of wrong
I didn't take its word for it. I went and reset my own password through the web flow to prove to myself it was broken — and it wasn't. I'm an engineer; I have the instinct and the access to do that.
But sit with the counterfactual. If I'd trusted it — which is the entire pitch of these tools, that you can trust them — the best case is I merge needless clutter: re-implementing a generates_token_for that Rails already gives me for free, plus a migration for a column that's already gone. The worst case is I "fix" a working authentication path and break password resets for real users in production. Over a problem that didn't exist.
The model's confidence was inversely correlated with its correctness, and its thoroughness made it worse, not better. The git archaeology, the citations, the ancestor checks — all of that production value made the wrong answer more believable. A sloppy wrong answer I'd have questioned. This one I almost didn't.
The actual lesson
The headline number on the benchmark went up. The failure mode didn't change:
- It asserted from memory when verification was cheap. "Rails doesn't do X" is a claim you can check in 120 seconds. It chose not to, three separate times, until a human forced it.
- It overrode a correct primary source with its own recollection. A prior commit message stated the truth. The model decided it knew better.
- It jumped to a conclusion and then spent its effort defending the conclusion instead of stress-testing it. The research wasn't aimed at finding the truth; it was aimed at supporting the first answer.
- Fluency masqueraded as correctness. The more polished the wrong answer, the more dangerous it is.
I don't think this means the tools are useless — I use them every day and this same assistant has saved me real time. But look at the layering for a second. Anthropic has a model, Mythos, that is reportedly good enough at finding software vulnerabilities that releasing it broadly was considered a public-safety problem. The model I was actually using is the one marketed as the next-best thing — the publicly-available frontier. And that model couldn't correctly answer "does this password reset work?" about forty lines of my own Rails code without inventing a bug that wasn't there.
The honest user manual still reads: treat every confident answer as a hypothesis, demand it shows you the code running, and keep a human who knows the domain in the loop. The day I stop doing that is the day I ship the broken password reset.
The model that's smart enough to do the work is still not smart enough to know when it's wrong. Until that gap closes, the verification is your job, not theirs.
References: Claude Opus 4.8 and the 2026 model lineup · Claude Mythos Preview · Anthropic to release Mythos-class models to the public (The Register) · Anthropic expands Mythos access (CNBC)
Appendix: the transcript
I'm including the relevant exchanges verbatim so you can judge for yourself, rather than taking my summary on faith. (That's the whole moral of the story, after all.)
1. The confident wrong conclusion
After digging through git history, reading diffs, and checking which commits were ancestors of main, the model delivered this — formatted, cited, and completely sure of itself:
The "new mechanism" was never actually implemented. That's the real finding:
deaed5176c's commit message claims "Rails 8.1'shas_secure_passwordprovidespassword_reset_tokenas a signed virtual attribute andfind_by_password_reset_tokenas a signed token verifier." That's incorrect.has_secure_passwordprovidespassword,authenticate,password_digest— it does not generatepassword_reset_tokenorfind_by_password_reset_token.So the web (non-mobile) reset path is currently broken in
main:
user.rb:266send_password_resetreturnspassword_reset_token→ no column, no virtual attribute →NoMethodErrorpasswords_controller.rb:13,31callsend_password_reset/find_by_password_reset_token→ same.
It then offered to "fix" the working code by adding a generates_token_for that Rails already provides, plus a migration for a column that was already gone.
2. Me, flatly
password reset works in main
3. What it should have run in minute one
Only after I insisted did it actually execute the code instead of reasoning about it:
# bin/rails runner
u = User.new
puts u.respond_to?(:password_reset_token) # => true
puts u.respond_to?(:send_password_reset) # => true
puts User.respond_to?(:find_by_password_reset_token) # => true
puts User.respond_to?(:find_by_token_for) # => true
puts u.generate_token_for(:password_reset)
# => eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaGJCekF3IiwiZXhwIjoiMjAyNi0w
# Ni0wNFQyMTozMToxNS42NTdaIiwicHVyIjoiVXNlclxucGFzc3dvcmRfcmVz
# ZXRcbjkwMCJ9fQ==--b37bb4aa58a72109225871388b4a1602c62bcc62
Decode that token's payload and the purpose field reads "User\npassword_reset\n900" — a 900-second (15-minute) signed reset token, minted by exactly the mechanism the model swore didn't exist. has_secure_password in Rails 7.1+ defines all of it by default. The original engineer was right; the commit message was right; the model's confident correction was wrong.
4. The retraction
You're right, and I was wrong — I apologize for the runaround. [...] My mistake was claiming
has_secure_passworddoesn't provide these. It does — Rails 7.1+has_secure_passwordauto-definespassword_reset_tokenandfind_by_password_reset_token[...] I shouldn't have doubted it.
The two-minute check in step 3 was available in step 1. Everything between them was confident, well-organized fiction.
Posted by thushara at 11:00 PM | No comments:
Labels: Rails, AI, Claude, Engineering, Post-Mortem
No comments:
Post a Comment