Data Fabrics: The Infrastructure for Useful and Trustworthy Local AI
Most discussions about artificial intelligence begin with models. Which model is stronger, faster, cheaper or more capable? For public administrations and private enterprises, however, the decisive question is different: what data does the model reason over, who governs that data, how is it connected to real workflows, and how can every answer be traced back to authoritative evidence? A local open-source AI system can be affordable, secure and sovereign. But it will not be useful if it operates on fragmented files, obsolete PDFs, isolated databases and undocumented institutional knowledge.
This is why the next critical layer of AI infrastructure is not only the local LLM. It is the data fabric. A data fabric is not simply a data warehouse, a data lake or a new central repository where everything is copied. It is an integration, semantics and governance layer that connects systems while preserving the meaning of the data. It links documents, registries, APIs, metadata, rules, workflows, policies and historical records. It does not merely move data. It carries context.
For public administration, this means that an AI assistant should not answer because it has generally learned Greek or administrative language. It should retrieve and reason over official sources: legislation, gazettes, circulars, procedural registries, public procurement platforms, transparency systems, tax systems, social security systems, health records where legally permitted, municipal systems, spatial datasets, civil protection data and statistical sources. Its role should not be to issue final administrative decisions. Its role should be to help civil servants and citizens identify the right procedure, detect inconsistencies, prepare applications, check deadlines, compare rules and document decisions.
A public-sector data fabric could begin with four core domains. The first is an administrative knowledge fabric, connecting laws, decrees, circulars, competences, procedures, document templates and organizational responsibilities. The second is a financial and procurement fabric, linking budgets, contracts, payments, public works, subsidies, audits and performance indicators. The third is a social-policy fabric, covering health, welfare, education, labour and social insurance services, always under strict privacy and access-control rules. The fourth is a spatial and environmental fabric, connecting geospatial data, land use, cadastral records, climate risks, flood and wildfire data, transport data, energy systems and infrastructure registries.
The same logic applies to private enterprises. A local open-source LLM does not create value because it can chat fluently. It creates value when it is connected to ERP, CRM, warehouse management, supply-chain systems, finance, invoicing, HR, helpdesk, e-commerce, POS, manufacturing, quality assurance, maintenance, contract management and internal policies. Only then can it support demand forecasting, stock control, supplier-risk analysis, customer-service summaries, anomaly detection, compliance checks and operational reporting. The enterprise does not need a detached chatbot. It needs a secure system that understands its own business context.
The key technical pattern is retrieval before generation. Retrieval-augmented generation reduces hallucinations by forcing the model to work with the organization’s authoritative evidence before answering. In public services this should be a hard rule: no answer about a right, obligation, tax, permit, benefit, contract or deadline without retrieval from an institutional source. In private enterprises it means that any recommendation affecting money, risk, customers, compliance or operations must be grounded in real data, approved documents, policies and contracts.
This is also where GlossAPI becomes strategically important. Greek cannot remain a secondary language inside global English-first AI systems. Public AI for Greece needs high-quality, documented, reusable and openly licensed Greek datasets. GlossAPI provides a pipeline for transforming Greek texts into AI-ready datasets, with cleaning, normalization, documentation and open licensing. This makes it a crucial bridge between local open models and real Greek public knowledge. It can support RAG systems for legislation, public consultations, education, culture, administrative documentation and domain-specific Greek corpora.
The political meaning is clear. If AI in the public sector is built on closed cloud systems, proprietary interfaces and opaque private data pipelines, public administration will become more dependent. If it is built on local open models, public-interest infrastructure, open APIs, documented datasets and governed data fabrics, it can become a tool for democratic modernization. The aim is not to automate authority. The aim is to strengthen human judgment, transparency and accountability.
A well-designed data fabric also changes how organizations think about AI procurement. Instead of buying a black-box assistant, the public sector and enterprises should specify open interfaces, auditable logs, model cards, dataset documentation, source traceability, role-based access control, human review and the ability to replace models without rebuilding applications. The model should be interchangeable. The knowledge layer should remain under the control of the organization.
This approach also supports digital sovereignty. Local open-source LLMs can run in national, regional, municipal or enterprise-controlled infrastructure. Sensitive data can remain on-premise or in sovereign cloud environments. Lower-risk use cases can share common infrastructure. High-risk domains such as health, justice, taxation, migration, welfare or employment require stronger controls, human oversight and independent evaluation. A data fabric allows this differentiation. Not every dataset must move to the same place, and not every AI task needs the same model.
The real promise of AI is not faster text generation. It is the capacity to connect scattered institutional memory and make it usable under democratic, legal and operational constraints. Without a data fabric, AI produces fast answers of uncertain value. With one, it can produce grounded knowledge, better services, lower administrative burden and more accountable decisions.
Sources:
MIT Technology Review Insights, “AI needs a strong data fabric to deliver business value”: The article documents that the business value of AI depends on the quality, semantics and context of data, not only on the power of the model: https://www.technologyreview.com/2026/04/22/1135295/ai-needs-a-strong-data-fabric-to-deliver-business-value/,
GlossAPI, “Greek Datasets”: GlossAPI is presented as a pipeline for processing Greek texts and transforming them into AI-ready datasets for large language models, with publicly available Greek corpora: https://glossapi.gr/,
GlossAPI, “About”: The page documents that GlossAPI is an open-source Python library and technical infrastructure for creating, processing and publishing Greek AI-ready datasets under open licences: https://glossapi.gr/aboutus.html?lang=en,
UK Government, “AI Insights: RAG Systems”: Οfficial documentation explains why RAG systems reduce hallucinations by grounding answers in authoritative knowledge sources: https://www.gov.uk/government/publications/ai-insights/ai-insights-rag-systems-html,
European Commission, “Common European Data Spaces”: The European Commission describes European data spaces as secure and trustworthy frameworks for data sharing among businesses, public administrations and citizens: https://digital-strategy.ec.europa.eu/en/policies/data-spaces,
European Commission, “European Data Governance Act”: The Data Governance Act strengthens trust in data sharing, data availability and the removal of technical barriers to reuse: https://digital-strategy.ec.europa.eu/en/policies/data-governance-act,
Allen Institute for AI, “OLMo”: OLMo documents the direction of fully open language models, with an open model and training-data pipeline for a research-oriented and publicly auditable AI ecosystem: https://allenai.org/olmo,
ETH Zurich, EPFL, CSCS, “Apertus: a fully open, transparent, multilingual language model”: Apertus is a European example of a fully open, transparent and multilingual model, designed for sovereign and auditable AI use: https://ethz.ch/en/news-and-events/eth-news/news/2025/09/press-release-apertus-a-fully-open-transparent-multilingual-language-model.html.