Courses and Training
We’re all becoming familiar with weird results when AI ‘hallucinates’ and provides totally inaccurate responses to prompts. AI will even generate plausible references and links to related academic papers that look, on the surface, to be perfectly acceptable. I recently had a very frustrating experience of chasing up a chapter in an academic text that I knew to be a published work only to find that the chapter referenced did not exist and had been created by AI to, I imagine, try to convince me that it’s response was based on real world academic research. It was a very good lesson in the extent to which you can trust AI output and how necessary it is to double and triple check responses to complex prompts. But there is another problem that is equally disconcerting and that stems from our own cognitive responses and expectations rather than AI’s troubling flaws.
The speed and apparent fluency of AI make it easy to forget that it doesn’t “know” anything in a human sense. Yet, according to a new study from Aalto University in Finland, this reliance on AI may be doing more than making us efficient, it may be quietly eroding our ability to judge how well we’re actually performing.
In two large-scale experiments, researchers found that when people use AI tools such as ChatGPT, they tend to overestimate their own performance and that this overconfidence is strongest among those who consider themselves the most “AI literate.” The familiar Dunning–Kruger effect (the tendency for people with low ability to overrate themselves) disappears. Instead, AI flips the psychology and experts become the most overconfident of all.
The results suggest that the more comfortable we become with AI, the less capable we may be of recognising when we’re wrong.
A recent study published in Computers in Human Behavior, titled “AI makes you smarter but none the wiser: The disconnect between performance and metacognition” (Welsch et al., 2025), explored how using large language models like ChatGPT influence people’s ability to assess their own performance accurately.
Led by Professor Robin Welsch and doctoral researcher Daniela da Silva Fernandes from Aalto University, the research team gathered around 500 participants to tackle challenging logical reasoning problems taken from the U.S. Law School Admission Test (LSAT). Half of the participants received assistance from ChatGPT, while the other half worked independently.
Interestingly, the AI users scored about three points higher than those who went it alone. However, the most striking takeaway was that they overestimated their performance by nearly four points. In simple terms, while AI did help improve their results, it also led them to believe they had performed even better than they actually did (Welsch et al., 2025).
Moreover, those who were more familiar with AI, meaning they understood how ChatGPT operates, had experience crafting prompts, and considered themselves confident users, exhibited the lowest metacognitive accuracy. This group struggled the most when it came to recognizing their mistakes.
As Professor Welsch puts it, “In the realm of AI, the Dunning-Kruger Effect seems to disappear. What’s particularly surprising is that individuals with greater AI literacy tend to be more overconfident. We would typically expect those with the deepest understanding of AI to be better at evaluating their own performance, but that wasn’t the case here” (Hudson, 2025).
The Dunning–Kruger effect describes a paradox of human self-perception first identified by psychologists David Dunning and Justin Kruger in 1999: people who perform poorly on a task often believe they’ve done well, while high performers tend to underestimate their abilities. The Aalto study suggests that AI reverses this pattern.
Instead of narrowing the gap between competence and confidence, AI widens it in the opposite direction. Skilled or experienced users, familiar with AI’s potential, appear to transfer that confidence to their own performance. They interpret AI’s fluent, authoritative output as evidence that they themselves are thinking more accurately.
In psychological terms, this is a problem of metacognition (our ability to evaluate our own thinking). As Welsch’s team notes, “optimising human-AI interaction requires users to reflect on their performance critically,” yet current AI systems “do not foster metacognitive awareness” (Welsch et al., 2025).
This isn’t simple vanity; it’s cognitive design. Tools like ChatGPT produce text that reads as confident and coherent, regardless of factual accuracy. When we collaborate with such systems, our brain’s trust mechanisms, evolved for human conversation, interpret confidence as competence.

A striking aspect of the Aalto experiments was how participants interacted with AI. Most issued a single prompt per question, accepted the first answer, and moved on. Very few refined or cross-checked the AI’s reasoning.
“People just thought the AI would solve things for them,” Welsch noted. “Usually there was one single interaction, which means users blindly trusted the system. It’s what we call cognitive offloading, when all the processing is done by AI” (Hudson, 2025).
Cognitive offloading is not new, humans have long externalised mental labour, from writing shopping lists to relying on calculators. The difference is that AI systems simulate reasoning. They not only store or compute; they argue, justify, and explain — and we are neurologically primed to believe explanations that sound plausible.
When the machine’s output appears thoughtful, we stop thinking as deeply ourselves. As Fernandes observes, “Current AI tools are not enough. They are not fostering metacognition, and we are not learning from our mistakes. We need platforms that encourage reflection” (Hudson, 2025).
The Aalto study’s paradox is that AI makes people perform better while simultaneously dulling their self-awareness. This is a potent mix for workplaces that prize productivity metrics over reflection.
If employees produce more output with AI assistance, they may equate that with improved skill or understanding. Over time, such systems could produce what cognitive scientists call deskilling: as we rely on machines to handle reasoning tasks, we lose both competence and confidence in our unaided abilities.
This isn’t just a theoretical concern. In professional settings, from legal reasoning to financial analysis or medical diagnostics, overconfidence in AI-assisted outcomes can lead to systemic risk. The human operator’s capacity to detect AI errors becomes weaker exactly when those errors are easiest to overlook.
As Welsch’s team notes, “AI levels cognitive and metacognitive performance in human, AI interaction, with consequences for accurate self-monitoring and overreliance” (Welsch et al., 2025).
The Aalto findings align with a growing body of cognitive psychology research on the “illusion of explanatory depth” : the phenomenon where people believe they understand complex systems (such as how a toilet works or how inflation rises) far better than they actually do (Rozenblit & Keil, 2002).
AI deepens this illusion. Because language models can articulate step-by-step reasoning, users feel as if they’ve followed and grasped the logic. In truth, they’ve only read a plausible explanation.
This can be especially problematic in training contexts where learners are encouraged to “use AI to explore” or “get quick feedback.” Without structured prompts for reflection. For example, asking learners to explain why an AI answer might be wrong. The process becomes passive consumption rather than active learning.
For educators, instructional designers, and learning platform developers, the Aalto study carries direct implications. It suggests that AI literacy, as it is currently taught, may not be enough. Knowing how to craft prompts or interpret model outputs does not ensure that users can assess when those outputs are wrong.
Training programs that integrate AI need to explicitly teach metacognitive strategies such as double-checking reasoning, comparing alternative outputs, and identifying uncertainty cues.
In practice, this might include:
Reflective prompting: requiring learners to ask the AI why it gave a particular answer, and then explain that reasoning back in their own words.
Confidence calibration exercises: having users predict how accurate their AI-assisted responses are, then review discrepancies.
Error tracing: analysing instances where the AI’s confident answer was wrong, and identifying why the human accepted it.
These methods echo Fernandes’s suggestion that “AI could ask users if they can explain their reasoning further.” This, she says, “would force the user to engage more deeply, face their illusion of knowledge, and promote critical thinking” (Hudson, 2025).
The next generation of AI-driven learning and workplace systems will need to move beyond performance assistance toward metacognitive support.
This means creating interfaces that don’t just answer questions but also prompt reflection. For example:
When a user accepts an AI’s answer, the system might ask, “What evidence supports this conclusion?”
When the AI provides reasoning, it could display confidence scores or alternative interpretations.
In collaborative settings, AI could generate devil’s-advocate responses to encourage critical dialogue.
Such design features would align AI with the pedagogical principle of productive struggle : the idea that learning requires effortful engagement, not frictionless completion.
Welsch’s findings imply that ease may be the enemy of insight. As AI smooths the path between question and answer, it removes the very friction that signals uncertainty and thus suppresses the feedback loops essential to learning.
The Aalto research also resonates beyond individual cognition. It mirrors a broader cultural shift in how professionals relate to knowledge itself.
In the pre-AI era, expertise was often defined by knowing , that is, holding information or mastering procedures. Today, it increasingly involves knowing how to question machine-generated knowledge. The new literacy is not technical; it is epistemic.
This shift places responsibility on educators, team leaders, and organisations to re-value uncertainty. Rather than equating confidence with competence, training programs must help learners become comfortable admitting when they don’t know and verifying when they think they do.
As cognitive scientist Steven Sloman has argued, understanding is inherently social; we rely on others’ minds to fill our gaps (Sloman & Fernbach, 2017). AI extends that collective cognition but without the social checks that keep human groups accountable. The Aalto study’s warning is clear: without deliberate reflection, AI collaboration becomes a feedback loop of misplaced certainty.
In professional environments increasingly shaped by automation and data-driven decision support, the metacognitive gap identified by Welsch and colleagues has real implications for organisational learning and governance.
For leaders, it means recognising that efficiency gains may mask declining critical capacity. A report written faster with AI may look polished but reflect less original reasoning or verification.
For trainers and instructional designers, it means embedding reflective checkpoints into every AI-supported learning path. Not as an optional “ethics” add-on but as a core competency.
For individuals, it means cultivating habits of scepticism: running multiple prompts, testing alternative formulations, or explicitly asking, “How could this be wrong?”
The paradox of the Aalto study is that while AI can make us “smarter”, improving task performance, it doesn’t automatically make us wiser. Wisdom requires awareness of the limits of one’s knowledge.
The phrase “artificial intelligence” tempts us to treat machines as partners in thought. But the Aalto findings remind us that the real challenge of AI is not what it can do, but what it stops us from doing, namely, reflecting on our own reasoning.
In the coming decade, as AI systems become standard across education, healthcare, finance, and creative industries, the key skill will not be producing outputs but maintaining metacognitive vigilance. The capacity to pause and ask, “Do I really understand this?” may become the most valuable human trait in an age of automated confidence.
For trainers, educators, and professionals designing AI-enhanced learning, this means a shift in emphasis: from tools that provide answers to systems that nurture awareness, reflection, and humility.
Because intelligence without reflection, human or artificial, is just another form of ignorance.
Hudson, S. (2025, October 28). When using AI, users fall for the Dunning-Kruger trap in reverse. Neuroscience News. https://neurosciencenews.com/ai-dunning-kruger-trap-29869/
Rozenblit, L., & Keil, F. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26(5), 521–562. https://doi.org/10.1207/s15516709cog2605_1
Sloman, S. A., & Fernbach, P. (2017). The Knowledge Illusion: Why We Never Think Alone. Penguin Books.
Welsch, R., da Silva Fernandes, D., et al. (2025). AI makes you smarter but none the wiser: The disconnect between performance and metacognition. Computers in Human Behavior, 150, 108779. https://doi.org/10.1016/j.chb.2025.108779
The Dunning-Kruger effect is a cognitive bias where people with low ability in a specific area overestimate their skill level, while high performers tend to underestimate theirs. This occurs because a lack of expertise prevents individuals from recognizing their own shortcomings and the competence of others. It can be counteracted by gaining more education and training, and by being open to constructive criticism and objective feedback.

