AI Is Getting Smarter But It’s Not Getting More Reliable

Models are getting faster and more capable at reasoning, but they still can't stop making things up, and the numbers are worse than most people realize. Across 26 leading models, hallucination rates ranged from 22% to 94%. Some of the biggest names fell apart under pressure: GPT-4o's accuracy dropped from 98.2% to 64.4%, and DeepSeek R1 collapsed from above 90% to 14.4%, while Grok 4.20 Beta, Claude 4.5 Haiku, and MiMo-V2-Pro held up best. And when you ask these models to do the thing everyone is actually asking them to do now, managing multi-step workflows with real tools across real conversations, none of them cracked 71% on Stanford's τ-bench benchmark. The gap between what these models promise in a demo and what they deliver under real conditions is still enormous, and anyone building strategy on the assumption that gap has closed is building on sand.

venturebeat.com venturebeat.com
Artificial Intelligence

AI Isn’t Hitting A Wall Soon

We've watched a lot of technology waves arrive and break and every time, the people who said it would change everything were at least partly right and at least partly wrong about how fast it would get there. Mustafa Suleyman lays out the case that this time the underlying math is different: since 2010, the compute going into frontier AI models has grown by a trillion times, and the forces driving that growth are still accelerating.

Software efficiency is improving so fast that the cost of running some models has dropped by a factor of 900 in a single year. What's coming isn't a better chatbot. It's teams of AI agents capable of running weeks-long projects with something approaching human-level judgment.

Every revolution I've lived through had a plausible ceiling. This one doesn't have one in sight yet.

Artificial Intelligence

We Don’t Need To Prove Anything

This whole "prove you made this without AI" movement is going to end soon. It just won't matter anymore. The ability to work with AI to help build content is too effective. And for those that use it correctly, it's a game-changing opportunity to be more productive and generate more and better content. Particularly when AI is used to organize and synthesize your own unique content and data.

“The problem is going to be definition and verification. Does chatting with an LLM about the idea before executing it manually count as using AI? And how could the creator prove no AI was involved?” Jonathan Stray, senior scientist at the UC Berkeley Center for Human-Compatible AI told The Verge.

Artificial Intelligence

Altman Didn’t Admit To “Smoke And Mirrors”

There is nothing in this Futurism article headline that supports this headline, "Sam Altman Opens Up About Telling CEO of Disney That It Had All Been Smoke and Mirrors," at all. These kinds of clickbait headlines infuriate me.

The article describes: A strategic shutdown, a missed partnership, and some awkward but professional communication.

The headline reframes it as: A confession, some kind of dramatic reveal, or a "we were faking it" moment.

Those are not even close to the same things.

Artificial Intelligence

AI & Cognitive Atrophy: The A-Frame Takeaway

The A-Frame conclusion of this Psychology Today article is where the gold is.

A-Frame: Practical Takeaway

Awareness of risks. Notice when you reach for AI before forming your own thought — not after. The gap between stimulus and tool is where cognitive agency actually lives. How wide is yours today compared to six months ago?

Appreciation of friction. Reintroduce productive difficulty deliberately. Write the first draft without assistance. Sit with the question overnight. Give yourself your own worst answer before asking a machine for a polished one.

Accept challenges along the path. Research on desirable difficulties confirms that productive struggle is the mechanism of durable learning — not its obstacle. Growth requires friction.

Accountability audits along the way. Once a week, do something cognitively demanding with no AI in the room. It is a form of calibration. Because you cannot track what you are losing if you never check whether you still have it."

Artificial Intelligence