Fastest Human Benchmark

With AI models clobbering every benchmark, it's time for human evaluation

Artificial intelligence has traditionally advanced through automatic accuracy tests in tasks meant to approximate human knowledge. Carefully crafted benchmark tests such as The General Language ...

Neuroscience News

“Humanity’s Last Exam”: The Super-Benchmark AI Is Currently Failing

Researchers debut "Humanity’s Last Exam," a benchmark of 2,500 expert-level questions that current AI models are failing.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results

With AI models clobbering every benchmark, it's time for human evaluation

“Humanity’s Last Exam”: The Super-Benchmark AI Is Currently Failing

Trending now