Codex Got Better because Claude Code Got…

May 6

How I stopped using Claude Code and learned to love OpenAI Codex

5 Comments

The missing piece in this article is that Anthropic didn't actually fix the issue. The underlying cause of the thinking regression wasn't a default setting, it was the fact that adaptive thinking became forced by default. Adaptive thinking is now the ONLY option in 4.7, making it MUCH worse in longer context work.

Reply (1)

Anthony Maio

May 13

Yep - Opus 4.7 is not usable outside of a fresh context window and both Sonnet and Haiku only have 200k context which causes compaction thrashing making all 3 unusable. If more people tried codex they'd never go back

Ikram Rana

May 15

I work with small business owners on AI automation every week and this competitive dynamic between tools is exactly what drives adoption. Practical AI for business improves fastest when the pressure is real. What workflow changes from these improvements are you finding most impactful in day-to-day use?

Reply (1)

Anthony Maio

May 15

My workflow remains generally the same, however Codex has nearly unlimited usage and is producing better, more reliable outputs that I can trust. Meanwhile, Opus 4.7 and Claude Code seem to be regressing to 1) wasting a LOT of tokens 2) very high latency 3) unacceptable hallucination rate 4) the model CANNOT HANDLE a 1M context window. It just can’t.

Pawel Jozefiak

May 11

The product-layer vs model-capability framing is what most comparisons miss. I ran a Claude Code harness in production for months and the pattern matched yours - undocumented behavior changes that only surfaced in edge cases.

The model was still capable. The harness assumptions had shifted around it.

The Codex point about structured inspection before writing is interesting because that read-before-write discipline was exactly what gave Claude Code an early advantage. When it stopped being consistent, trust dropped fast. Product trust is fragile in a way that model quality is not - model regressions you can benchmark, harness drift you only notice when something quietly breaks.