STATISTA: How Accurate are AI Chatbots?
by Tristan Gaudiaut,
Nov 28, 2025
Three years ago, on November 30, 2022, the official release of ChatGPT marked a turning point in artificial intelligence, propelling AI chatbots (or large language models) into the mainstream. Since then, progress has been undeniable: LLMs' ability to process complex queries, summarize vast amounts of information and even assist in coding has improved considerably.
Yet, hallucinations, misinterpretations of context and inaccuracies continue to plague even the most sophisticated of currently available models. A study from the European Broadcasting Union and the BBC reveals that while the rate of inaccurate responses has declined since the end of last year, errors continue to be widespread.
Data collected between May and June 2025 and analyzed by a cohort of journalists revealed that almost half of the responses (48 percent) from popular chatbots - free versions of ChatGPT, Gemini, Copilot and Perplexity - contained accuracy issues. 17 percent were significant errors, mainly regarding sourcing and missing context. In December 2024, the rate of inaccurate responses (observed using a smaller answers sample) was significantly higher: 72 percent for all four LLMs. 31 percent were major issues in that case.
Despite gradual improvements, these shortcomings raise critical questions about reliability, especially in high-stakes applications like healthcare, legal advice or education. While AI developers keep pushing boundaries, users must remain aware of the technology's current limitations.
You will find more infographics at Statista
When I started out as a student in academia, in the days before grade inflation, a significant inaccuracy would likely merit a "D" while minor inaccuracies would likely merit a "C." In today's world those would probably be inflated to "C" for significant inaccuracies and "B" for minor inaccuracies.
ReplyDeleteIn other words, half the students who use these tools for their papers should be getting B's for minor inaccuracies, and about a quarter should be getting C's for major inaccuracies in their information.
However, about half might merit A's depending upon the excellence of their reasoning and conclusions.
This suggests ways in which academics might use AI in their classrooms which students who use the AI uncritically getting B and Cs, but those exposing the inaccuracies of AI getting points that might merit them an A.
I use it for the most anodyne of tasks (e.g. capturing and writing meeting notes), but even these must be scrutinized for innacuracies before they can be published.
ReplyDeleteMost of us can easily tell what is true and what is false, but these AIs (which work despite nobody being able to completely explain how) can't seem to tell the difference. Or maybe it's that they can't tell the difference between what is real and what isn't.
Have any of you listened to the AI-generated country songs that are busting the charts? I'm not much of a country music fan, and based on the the snippets of music I've heard, I'm not about to become one.
Not a music fan, but lots has been written about AI used to grind out plot formulas and passages for genre fiction like westerns and romances. I don't think AI can churn out an entire novel, but my guess is that you could do it in short chapters.
DeleteLots has also been written about how AI is used to generate porn stories and deep fake videos. AI is often at the other end of smut texting. Not surprising, since tech has been used since the invention of the stylus to generate porn.
About a year and a half ago there was a strike in Hollywood - the screenwriters. They were striking to try to get contractual assurances that they wouldn’t be replaced by AI, which probably would work. TV series are written by teams that change throughout a season. The strike lasted for months and new contracts were signed protecting their jobs for now. But next time around, AI will probably win.
Delete" But next time around, AI will probably win."
DeleteI think that's probably right. For screenwriters, we might think about it this way: the British detective and police-procedural fare on BritBox (much of it seemingly the same stuff that has been on PBS Masterpiece over the years) can be as formulaic as a Western or a romance. So some enterprising person will train AIs to generate the scripts, and perhaps even the productions. And when they figure out how to do it at a hundredth or a ten-thousandth of the cost, they will offer a competing streaming service that costs, say, $5/year.
This is basically how Uber and Lyft took on and beat the highly regulated and price-controlled taxicab industry: by offering a disruptively cheap and superior service; ignoring any legal barriers to market entry; building up a large and loyal customer base; and then defying the regulatory authorities (mostly city-level departments) to ban them. Most cities, being run by elected officials who are sensitive to voter sentiment, caved.
Resistance to AI as to the Borg is futile. I do hope that publishers and producers label AI generated material so that I can avoid it. Happy to support humanity, not machine-ity.
DeleteAI is pretty much off my list of things to pay attention to. Country music is bad enough. I would not think of subjecting myself to,AI co7 try music.
ReplyDeleteI haven’t used AI except by default in google searches, which often brings up an AI summary as the top of the list. It’s often wrong, sometimes very wrong. I usually read it, but it’s a bit like wiki or Snopes - a starting point as long as it has a few links available. If my docs start using it and I learn about it, I may have to get a whole lot of second opinions.
I'm sure the plan is to get consumers used to AI in everything. It'll be like their getting us used to the processed crap we eat. Insipid main stream news. Just add it to the list of crappified things we already tolerate. The main purpose is to displace community and human interaction for the sake of the billionaires. I'm sure that in their conversations during golf, you'd hear "They'll get used to that s**t. They always do." China is not my enemy. Hamas is not my enemy. Venezuela is not my enemy. But I definitely do have an evil enemy.
ReplyDeleteI haven't used AI yet (anyway I haven't used it on purpose, I'm sure it infiltrates everything). I would use it if I saw a need for it, but so far I haven't. In some ways I perceive it the same way I view tarot cards and ouija boards, as a lot of nonsense, but a little dangerous. We don't really understand what we are tapping into. Probably I'm a Luddite.
ReplyDeleteMy Samsung phone keeps trying to get me to try Gemini. So far I haven't.
DeleteNow this is weird. I'm typing this on my Kindle, not using voice at all. But just now a text popped up on my phone that Gemini isn't available to anyone under 18, and asked me to check off "got it".
DeleteGemini is baked into my Google search engine. It doesn't seem more innacurate than the others I fool around with. In fact, Jack's bar chart above shows it makes fewer major errors than the others.
DeleteAnne mentioned the ramifications of a medical "hallucination". There are a lot of other possibilities for anxiety: self-driving cars; public safety responses; warfare decision-making.
I suppose the plot line of the robot that turns against its human masters and proves unstoppable, is an old one now. It seems realistic to me, and not because the robots have evil baked into them. It's that whichever AI wrote their code made coding mistakes which humans failed to catch, so the 'fail-safes' meant to protect us, won't work as planned.
One concern I have about AI doing our thinking for us are the implications for neurological development of kids. Most of us spent, probably, the first six years of our schooling learning things such as reading, multiplication tables, spelling, fractions, long division, basic grammar. There were a lot of reps and drills. If we fast forward through all that because we have AI and don't really think kids need to be proficient at that any more, then we end up without the " muscle memory" and development of the brain in those areas. Of course some would say that we will become proficient in other areas.
DeleteI think this is one mass neurological experiment on top of the previous one, the smart phone. I think such total dependence on electronic infrastructure is a mistake, because that infrastructure may become unsustainable. I finished the really weird Dune sci-fi saga. The people limited the level of computer technology because humans had had to fight a war with self-aware machines. Building an AI was punishable by death. Some depended as little as possible on technology while enhancing their mental performance with drugs and disciplines. I am becoming. if not skeptical of technology, skeptical of how we use it and for what reasons. Specific usage of AI in scientific research makes sense if, for instance, you're looking for alternatives to concrete to avoid the CO2 footprint. But the commercial proliferation concerns me greatly.
DeleteAI, like science before it, is being born in a really nasty neighborhood.
Delete"One concern I have about AI doing our thinking for us are the implications for neurological development of kids. Most of us spent, probably, the first six years of our schooling learning things such as reading, multiplication tables, spelling, fractions, long division, basic grammar. There were a lot of reps and drills. If we fast forward through all that because we have AI and don't really think kids need to be proficient at that any more, then we end up without the muscle memory and development of the brain in those areas."
ReplyDeleteYes.
Increasingly, no one sees the value in any of those old muscles. Some people who are "neurodivergent" claim to be the new ubermensch of the future because they are wired to click with the cyber world. Elon Musk and his pronatalist followers seem to be leading that parade, impregnating every woman they see to create more "Aspie" children who prefer with a computer in a room by themselves than dealing with people.
They see those of us with analog brains like that guy in "I Am Legend," dead ends in human evolution.
The humanities have been sidelined for decades except to as grist for the entertainment mill, where plot and character development are secondary to special effects.
Our human heritage and connection with the past is especially irrelevant when AI allows us to alter and fragment reality itself. Possibly this explains why evangelicals, who are nimble at molding Scripture to suit the Powers That Be, are the only denomination gaining ground. Catholics and mainline Protestants trying to preserve the whole message, even the bits that make us uncomfy, are hemmorhaging followers. It's not about loving your neighbor any more. It's about prosperity and abundance, aka money.
But this is all screaming into the wind by an old lady on her way out. The Boy is coming for lunch, and I gotta start the ziti.
Actually, even the evangelicals are losing ground, with the exodus led by young adult women. Echoes of the RCC. When you lose the young women, you usually lose the future family too.
DeleteThe data above from Statista have convinced me that AI both in its present forms and in its future potential is being greatly overrated.
ReplyDeleteI am not afraid of AI taking over many jobs or becoming more intelligent that humans. The more likely scenario is just another stock market bust, like the housing bust, with its many people, mostly average and poor people getting hurt.
Plans are already afoot for AI to take over jobs.
DeleteMy colleagues in the tutoring center are slated to be replaced by an AI program called SCOUT that provides feedback on student essays.
AI "companions" for the elderly are in the works in nursing homes.
AI wait staff have been rolled out in a few restaurants.
AI analyzed my bloodwork at my oncology appt last week and spit out a new battery of tests that the nurse brought in. I haven't seen the oncologist for more than two years even though I am sicker, just a frazzled nurse who orders more tests and can't give me any conclusive info.
AI has already killed jobs for graphic artists and likely will affect ad copy writers.
Whether AI takeover happens in a big way will depend on whether it can perform cheaper and better than people. Right now it seems pretty glitchy, as your stats indicate.
But the self-serve lines at banks, gas stations, and grocery stores should signal how eager business is to eliminate jobs wherever possible.
I get how AI could take over graphic art jobs and ad copy writers. But I don't get how it could take over physical care jobs in nursing homes. Even just "emotional support" jobs. To a degree it could go over lab results, but not actually drawing samples.
DeleteIt is interesting that our Walmart Superstore went all self checkout for a while. But now it has brought back checkers for some of the lines.
I agree with Jack that it could drive a stock market bubble bust.
Here ya go: https://elliq.com/pages/caregivers
DeleteAnd AI pets for the geezers, too: https://www.mindcarestore.com/Joy-For-All-Pets-Companion-Alzheimer-doll-therapy-p/mc-0604.htm
DeleteTalked to a friend who moonlights in an adult day care. The robot cats and dogs broke her heart. Families get these emotional pacifiers for Grandma because they can't be bothered to show up for real.
DeleteThey had a real therapy cat at the assisted living where my mother-in- law was. He was that rare creature, a people loving cat. He belonged to one of the staff members, and wasn't at the home full time. My mil loved seeing him, since she was a cat person. I don't think she would have connected with a robo cat!
DeleteI can see the program in your first link maybe enabling someone to live in their own home longer, since it was basically a meds reminder, and probably could tell if someone fell. But if it came to someone actually needing physical care, it wouldn't fill the need.
It's probably similar to the blood pressure monitor program we are in through Medicare. Which is helpful. But the bi-monthly nurse call-in isn't.. I often blow off answering my phone for that since I think it's nosy and an intrusion.
This comment has been removed by the author.
DeleteHuh. I guess I'm either not being clear or everybody thinks my dystopian hellscape predictions are just dumb. Cuz all the cyber crap has been a boon to mankind so far.
DeleteOf course AI cannot change an adult diaper or operate a Hoyer. Not yet, anyway.
How about this: My uncle's nursing home had real therapy dogs with real volunteers. However, scheduling visits, getting permissions, noting allergic patients, moving patients to the dog visiting area--all that is extra staff time.
So why bother when you can plug in a non-allergenic to go pet? (As I was on my way out of the hospital this morning I noticed a whole row of robo cats lined up in the gift shop window.)
Or why bother to chat for a few minutes with an elderly shut in when you can plug them into a maintenance system?
The attitude undergirding these gizmos is this: Human effort is wasted on a bunch of dying old people.
Please just bring me the Black Capsule.
to go pet = robo pet
DeleteSmall pets, dogs and cats, were allowed at the assisted living place where we spent the summer of 2024. Quite a few residents had dogs, but I never saw a cat. Maybe they were confined to the unit. They had a special dog walking(pooping) area on the grounds, cleanup bags provided. The place had apartments, not just rooms, ranging from studios with a small kitchen and private bathroom to 2 beds, 2 baths, full kitchen, about 1150 sq ft. Most of the singles had studios. About 8 couples living together. Only two couples ( including us) were married. The others were divorced or widowed but found companions for tge rest of the journey and split the rent. I’m not sure about the locked memory care wing - it may have just been rooms, not apartments. Some residents were “independent” living, with three meals, laundry service, housekeeping service, transportation to medical appointments and shopping, and the various “activities”-, ranging from virtual bowling to field trips to museums. Care was assessed by a point system. My husband had care, but I did not. I was “ independent”. Some who came to dining room meals and activities had escorts to walk with them downstairs or take them in a wheelchair. One man’s escort stayed with him at the table and fed him. Tray service was another expense if they wanted to stay in their rooms. My husband could access the dining room himself without an escort if I wasn’t around, but that was only a week or so after I came back to MD to get everything ready. Usually is was with him. He was paralyzed, but fairly strong, and with use of his arms and hands, more mobile than many of the very frail residents. Most were in their 90s and late 80’s. My husband also had bed baths, transfers in and out of bed and wheelchair. Clothing changes, colostomy and Foley bag emptying. Many people had “ medication management” - the “ med tech” dispensed the meds to individuals according to their schedules. They had a pharmacy option to get prescription meds at more money. No need to go out to pick them up. They also arranged for Dr or NP visits, a dentist, and a podiatrist to come - all private pay. Of course the local pharmacies delivered meds too for free, so I don’t know if anyone used the pharmacy. Everyone was given a lanyard with a call button for help. Locations were tracked online so if someone fell in the postage stamp garden they would know not to go to the unit. A number of people hired outside caregivers also to do?. I estimate that we saw about 1/3 -1/2 of the residents in the dining room and at activities. There were about 20 regulars at the activities. Most stayed in their rooms and had meals delivered. They had local musicians or music students in about once/ week or two. That always drew the biggest crowd - about 30. There was fairly basic on- site PT and OT that was Medicare covered or private pay. Not included with the rent. There was a unisex hair salon on- site available a couple of times/ week. Holidays had special meals. This was a “ high end” assisted living place. It was attractive and well maintained. The caregivers were good, but they are not medically trained. If medical care is needed people are either forced to leave or sometimes can bring in medical people at their own expense. This was all very expensive. I figured our retirement would run out fairly quickly if we didn’t get home.
DeleteI hope to stay in our own home until the end, even if we have to sell the house and move to a condo, and have our own caregivers as needed. But finding GOOD caregivers is a real challenge. I don’t think AI can handle serious assisted living needs, and definitely not good medical care.
The memory care people sometimes had stuffed animals. We rarely saw them though. One woman who came to mass with her caregiver had a child’s baby carriage with three very well loved fabric cats, falling apart, that she stroked throughout mass. There were usually about ten people. The memory care woman didn’t come often. I was the priests assistant, passing out the hymnals and writing the responses on a whiteboard. I was one of only two therecwhobdidntvusecaxealmer or wheelchair to get to the activities room where mass was held. The priest himself was a borderline memory care person. He was very nice, but sometimes forgot where he was during the mass and repeated parts of it. I never saw a robot- pet.
DeleteAmericans and Brits warehouse their elderly. Soul killing even in high- end places like the one we were in for 3 months. Not so true of Europeans, especially not those on the Mediterranean. Asians almost never warehouse their elderly. I don’t know about Northern Europeans in places like Eastern Europe, Germany, Hollande, and Scandinavia.
Anne, I think your experience at high end assisted living facility is a good example of my post on Disney as a model of the upper-class economy
Deletehttps://newgathering.blogspot.com/2025/09/disney-model-of-upper-class-economy.html
For people who cannot afford upper class amenities, the economic model is giving people poor service for the money that we are paying.
It all started with shipping service jobs overseas. For example, if you want online help for your Verizon phone, you basically get someone who understands and speaks English poorly. Like AI, the overseas workers are trained to give standard responses to English cue words. So, you spend a lot of time going through basic exercises with the hope your foreign assistant will find and the solve the problem. When we need Verizon help, we go to our local store. We may not get a very high-tech person, but at least we can troubleshoot with someone whom we can understand.
Honda now uses thinly disguised AI answer persons. I avoid them by calling the local dealer directly to talk to someone who hopefully has serviced my car before, can call up my service record, and decide what is needed.
So, we are all engaged with poorly functioning bureaucracies whether human or AI
I wish we would quit speaking of "warehousing" people. It just throws guilt on family members who couldn't care for their loved one at home. For instance, us. We live in a two bedroom, one bathroom house built in the 1950s. It isn't handicapped accessible for someone in a wheelchair. Not to mention we both worked full time at the time my mother in law needed care. So she was in assisted living. It wasn't as nice as the one Anne was talking about, but it was decent, and the staff were lovely. It was a 20 mile drive but we both went over several times a week.
DeleteIt's funny that it is usually a woman who is expected to provide care, including in those Asian and Mediterranean countries who don't warehouse their elderly.
If I get to the point where I can't live at home, the last thing I want is for my daughters-in-law to have to provide care.