Artificial Intelligence

Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark ↗ ↖

14 August 2025·7 words

(Guest article on the Nous Research blog) Anecdotal evidence suggests open weight models produce significantly more tokens for similar tasks than closed weight models. This report systematically investigates these observations. We confirm this trend to be generally true, but observe significant differences depending on problem domain.

↑