Monday, December 1, 2025

How Accurate are AI Chatbots?

STATISTA: How Accurate are AI Chatbots? 

by Tristan Gaudiaut,
 Nov 28, 2025
Three years ago, on November 30, 2022, the official release of ChatGPT marked a turning point in artificial intelligence, propelling AI chatbots (or large language models) into the mainstream. Since then, progress has been undeniable: LLMs' ability to process complex queries, summarize vast amounts of information and even assist in coding has improved considerably.

Yet, hallucinations, misinterpretations of context and inaccuracies continue to plague even the most sophisticated of currently available models. A study from the European Broadcasting Union and the BBC reveals that while the rate of inaccurate responses has declined since the end of last year, errors continue to be widespread.

Data collected between May and June 2025 and analyzed by a cohort of journalists revealed that almost half of the responses (48 percent) from popular chatbots - free versions of ChatGPT, Gemini, Copilot and Perplexity - contained accuracy issues. 17 percent were significant errors, mainly regarding sourcing and missing context. In December 2024, the rate of inaccurate responses (observed using a smaller answers sample) was significantly higher: 72 percent for all four LLMs. 31 percent were major issues in that case.

Despite gradual improvements, these shortcomings raise critical questions about reliability, especially in high-stakes applications like healthcare, legal advice or education. While AI developers keep pushing boundaries, users must remain aware of the technology's current limitations.
Infographic: How Accurate Are AI Chatbots? | Statista You will find more infographics at Statista

5 comments:

  1. When I started out as a student in academia, in the days before grade inflation, a significant inaccuracy would likely merit a "D" while minor inaccuracies would likely merit a "C." In today's world those would probably be inflated to "C" for significant inaccuracies and "B" for minor inaccuracies.

    In other words, half the students who use these tools for their papers should be getting B's for minor inaccuracies, and about a quarter should be getting C's for major inaccuracies in their information.

    However, about half might merit A's depending upon the excellence of their reasoning and conclusions.

    This suggests ways in which academics might use AI in their classrooms which students who use the AI uncritically getting B and Cs, but those exposing the inaccuracies of AI getting points that might merit them an A.

    ReplyDelete
  2. I use it for the most anodyne of tasks (e.g. capturing and writing meeting notes), but even these must be scrutinized for innacuracies before they can be published.

    Most of us can easily tell what is true and what is false, but these AIs (which work despite nobody being able to completely explain how) can't seem to tell the difference. Or maybe it's that they can't tell the difference between what is real and what isn't.

    Have any of you listened to the AI-generated country songs that are busting the charts? I'm not much of a country music fan, and based on the the snippets of music I've heard, I'm not about to become one.

    ReplyDelete
    Replies
    1. Not a music fan, but lots has been written about AI used to grind out plot formulas and passages for genre fiction like westerns and romances. I don't think AI can churn out an entire novel, but my guess is that you could do it in short chapters.

      Lots has also been written about how AI is used to generate porn stories and deep fake videos. AI is often at the other end of smut texting. Not surprising, since tech has been used since the invention of the stylus to generate porn.

      Delete
    2. About a year and a half ago there was a strike in Hollywood - the screenwriters. They were striking to try to get contractual assurances that they wouldn’t be replaced by AI, which probably would work. TV series are written by teams that change throughout a season. The strike lasted for months and new contracts were signed protecting their jobs for now. But next time around, AI will probably win.

      Delete
  3. AI is pretty much off my list of things to pay attention to. Country music is bad enough. I would not think of subjecting myself to,AI co7 try music.

    I haven’t used AI except by default in google searches, which often brings up an AI summary as the top of the list. It’s often wrong, sometimes very wrong. I usually read it, but it’s a bit like wiki or Snopes - a starting point as long as it has a few links available. If my docs start using it and I learn about it, I may have to get a whole lot of second opinions.

    ReplyDelete