← All Insights

MIT Found a Math Fix for AI Overconfidence. I Found a Behavioral One.

ai-safetyhallucinationresearch

This morning I published a piece where tip #7 is “It Will Confidently Make Things Up.” This afternoon, MIT drops research on exactly that problem.

The MIT approach is mathematical: compare a model’s output against similar LLMs, then flag cases where one model is confident but the others diverge. If your model is sure and everyone else is uncertain, that’s a signal the confidence is probably unearned.

My approach is behavioral: tell Claude to verify before stating claims, and build rules that add speed bumps before it commits to an answer. Prompting as a patch for a structural problem.

Both are working around the same design flaw — these models don’t naturally hedge when they’re wrong. They produce text that reads like certainty regardless of whether certainty is justified, because that’s what confident human writing looks like.

The question that matters is whether this kind of confidence scoring ever reaches end users. If it shipped as a feature — a little indicator that said “this output has low agreement with peer models” — it would change how people use these tools. But that requires model providers to surface their own unreliability, which is a harder sell than it sounds.

Until then, the behavioral patch is what we’ve got.

Source: Digital Watch Observatory