A new study put ChatGPT to the test by asking it to judge whether hundreds of scientific hypotheses were true or false—and the results were far from reassuring. While the AI got it right about 80% of the time on the surface, its performance dropped significantly when accounting for random guessing, revealing only modest reasoning ability. Even more concerning, it frequently contradicted itself when asked the exact same question multiple times, sometimes flipping answers back and forth.