May 17, 2026

Why AI Still Fails at Common Sense

Fluent language is not understanding

Modern large language models can draft reports, summarise legal documents, write code, translate texts, answer questions and coordinate multi-step tasks with impressive speed. This fluency creates a dangerous illusion: because the answer sounds human, users often assume that the system understands like a human.

It does not. AI models do not experience the world. They do not have bodies, social judgement, practical memory or lived contact with institutions. They do not know what it means to stand in a queue, misunderstand a public form, miss a deadline or distinguish between a general rule and an exception that applies to a real person. They calculate patterns in language and data. They can be useful, sometimes extremely useful, but they can also make errors that would look absurd to a person.

This is why the same model can produce a polished policy memo and fail at a simple common-sense puzzle. Linguistic fluency is not judgement. Statistical association is not causal understanding. Human-like wording is not proof of human-like reasoning.

This distinction matters especially in public administration. If a model gives a wrong answer about a private hobby, the cost may be low. If it gives a wrong answer about a tax obligation, a benefit, a visa, a permit, a public procurement rule or a health service, the cost can fall on a citizen, a public servant or a vulnerable group.

AI agents as button-pushing explorers

A useful way to understand these systems is to think of them as button-pushing explorers. They try an action, observe the result and adjust the next action. In environments with clear feedback, this can produce powerful behaviour. In environments that require deep context, long-term consistency, institutional knowledge or moral judgement, the same mechanism becomes fragile.

The risk increases when AI systems become agents. An agent does not only answer a question. It can call tools, read files, fill forms, trigger workflows, generate letters, classify cases or recommend actions. A mistake no longer remains inside a paragraph. It can enter an administrative process.

In the public sector, this is not a minor technical issue. A confident but ungrounded answer can mislead a citizen. A summary of an outdated legal rule can misguide a civil servant. A chatbot that answers without showing its sources can transfer uncertainty from the model into the state. The more official the interface looks, the greater the risk of misplaced trust.

That is why public AI must not be designed as a magic box that speaks with authority. It must be designed as a controlled information system that retrieves, cites, logs, limits and escalates.

Why RAG is essential for public AI

Retrieval-Augmented Generation, or RAG, is one of the most important safeguards for using generative AI in public services. In a RAG system, the model does not answer only from its internal training. It first retrieves relevant material from trusted sources: laws, official guidance, administrative procedures, public service pages, case manuals, procurement rules, knowledge bases or validated datasets. It then generates an answer grounded in that retrieved material.

This changes the role of the model. The model is no longer treated as an oracle. It becomes an interface over a controlled body of knowledge. A citizen assistant should not “remember” what applies to a driving licence, a family benefit or a business permit. It should retrieve the current procedure, the relevant official page, the required documents, the legal basis and the date of validity. A tool for civil servants should not summarise legislation from memory. It should point to the article, paragraph, source and version used.

RAG does not eliminate hallucinations. It reduces them. It does so by anchoring answers in verifiable documents, making the source visible, creating an audit trail and allowing errors to be diagnosed. If an answer is wrong, the administration can examine whether the retrieval failed, the source was outdated, the user question was ambiguous or the model generated an unsupported conclusion.

This is especially important for open and democratic digital government. Public AI should be inspectable. The datasets, prompts, retrieval rules, evaluation methods and error reports should be documented. Where public money funds the system, the code, models or at least the interfaces and evaluation assets should be reusable under open terms whenever possible.

European examples

France offers a strong example through Albert API, developed for public administrations. Its significance is not only that it gives administrations access to generative AI models. It also offers RAG as a service, allowing public bodies to build AI tools grounded in their own documentary bases. This is the right direction: public-sector AI should be connected to official knowledge, hosted in a controlled environment and designed to reduce dependency on closed platforms.

The United Kingdom has tested GOV.UK Chat as a conversational RAG system built on GOV.UK content. The lesson is double. First, RAG can make government information easier to find by allowing citizens to ask questions in natural language. Second, public trust must be handled carefully. Users may overtrust a chatbot because it appears under a government brand. The system therefore needs warnings, links back to source pages and continuous evaluation.

Estonia’s Bürokratt shows another important principle. A national virtual assistant can help citizens find public service information, but when it cannot resolve the issue, it should route the user to a human customer service representative. This is not a weakness. It is good design. A public AI system should know when to stop.

At EU level, the European Labour Authority’s ESSbot shows how RAG can support access to legal information in cross-border labour and social security contexts. The key point is not just automation. It is iterative improvement: user feedback is used to refine retrieval so that the system fetches more relevant legal information over time.

The Greek lesson

For Greece, the policy direction should be clear. Any public AI system dealing with rights, obligations, benefits, permits, taxes, procurement, healthcare, education or administrative procedures should be RAG by default. It should retrieve from official registries, legal texts, service catalogues and validated public data. It should provide source references. It should log interactions. It should include human oversight. It should never issue final administrative decisions automatically.

The goal is not to make the state sound intelligent. The goal is to make it more reliable, more transparent and more accountable. AI can help public services become faster. Only careful architecture can make them trustworthy. In public administration, trust is not a user-experience feature. It is a democratic requirement.

Sources:

ARC Prize Foundation, ARC-AGI-3: A New Challenge for Frontier Agentic Systems: The paper documents the gap between human performance and the performance of advanced AI systems on interactive reasoning problems, with humans scoring 100% and AI systems scoring below 1%. It is directly relevant to the argument that fluency does not equal common sense: https://arxiv.org/abs/2603.24621,

UK Government, AI Insights: RAG Systems: Explains why RAG systems reduce hallucinations by grounding answers in authoritative knowledge sources instead of relying only on the model’s fixed training data: https://www.gov.uk/government/publications/ai-insights/ai-insights-rag-systems-html,

DINUM, Albert API: The official French Albert API platform is described as an interministerial generative AI platform for public administrations, providing access to models and RAG as a service. It is a suitable example of a European public AI infrastructure based on evidence-grounded retrieval: https://albert.sites.beta.gouv.fr/,

Government Digital Service, Developing GOV.UK Chat: Our data science and AI engineering journey: Describes GOV.UK Chat as a RAG system that combines semantic search and generative AI to provide answers based on GOV.UK content, with an emphasis on evaluation and verification: https://insidegovuk.blog.gov.uk/2026/05/15/developing-gov-uk-chat-our-data-science-and-ai-engineering-journey/,

European Labour Authority, ESSbot AI-powered chatbot public record: The ESSbot public record explicitly refers to the use of a RAG framework for more accurate retrieval of relevant legal information from external sources and to the use of feedback to improve the system: https://www.ela.europa.eu/sites/default/files/2026-02/ELA_Record_ESSbot_public_version.pdf.