GRDD+: A Large-Scale Greek Dialectal Dataset for the Age of LLMs
Nov 11, 2025

GRDD+: A Large-Scale Greek Dialectal Dataset for the Age of LLMs

LLMs meet Greek dialects Modern Greek is far from monolithic. From Cretan and Cypriot Greek to Pontic, Heptanesian, Tsakonian and Griko, the Greek-speaking world is characterised by rich dialectal variation shaped by geography, contact, and history. Meanwhile, Large Language Models (LLMs) dominate contemporary NLP. Despite their impressive performance on standard language, LLMs struggle significantly with […]

Read Article
Omnilingual ASR and the Future of Universal Speech Recognition
Nov 10, 2025

Omnilingual ASR and the Future of Universal Speech Recognition

A Global Problem: Thousands of Languages Left Behind In recent years, automatic speech recognition has reached impressive accuracy for well-resourced languages. Yet this progress has not been shared equally. The majority of the world’s languages remain unsupported, particularly those with limited written tradition, few digital resources or small speaker populations. This technological exclusion has real […]

Read Article
AI In Greece, ambition without real digital sovereignty
Nov 10, 2025

AI In Greece, ambition without real digital sovereignty

A national vision built on foreign foundations Greece’s Blueprint for AI Transformation presents an inspiring narrative, human dignity, participation, transparency and international cooperation are placed at the centre of a new national strategy. Six flagship programmes promise progress in public administration, education, research, innovation, cultural heritage and ethics. Yet one crucial choice is largely absent, […]

Read Article
Academic Knowledge Corpus
Nov 9, 2025

Academic Knowledge Corpus

Abstract The present dataset constitutes a high-quality text corpus derived from Greek doctoral dissertations, accompanied by their respective metadata. It includes 55,423 records covering the period 1975–2025, representing the largest unified corpus of Greek academic writing constructed to date for Natural Language Processing (NLP) purposes. The collection and processing pipeline involved a multi-layered procedure comprising […]

Read Article
Forging GEMs: A New Generation of Greek Language Models for Modern NLP
Oct 25, 2025

Forging GEMs: A New Generation of Greek Language Models for Modern NLP

Natural Language Processing for Modern Greek has long faced structural challenges: limited high-quality datasets, fragmented modeling efforts, and a shortage of architectures capable of handling complex, domain-specific text. Legal documents, in particular, require models with exceptional contextual depth and linguistic precision.To address these gaps, Novelcore introduces the Greek Embedding Models (GEMs), a new family of […]

Read Article
Why LLM Training Data Must Become a Global Public Good
Oct 13, 2025

Why LLM Training Data Must Become a Global Public Good

AI Innovation Starts with Shared, Trusted Data As large language models (LLMs) increasingly shape critical systems in governance, education, public health, and disaster response, the quality and governance of the data that train them have become central to public-interest innovation. Artificial intelligence is only as fair, transparent, and representative as the information it learns from. […]

Read Article
Advancing Spontaneous Speech Recognition: Mozilla’s New Shared Task for Underrepresented Languages
Oct 2, 2025

Advancing Spontaneous Speech Recognition: Mozilla’s New Shared Task for Underrepresented Languages

Rethinking what speech recognition should be Automatic speech recognition has made dramatic progress, yet most models are trained on curated, carefully read speech. Real conversations are far more unpredictable, filled with hesitations, corrections, shifts in tone and spontaneous expression. Mozilla is launching a bold effort to address this gap through a shared challenge built entirely […]

Read Article
Toward “AI-Ready” Development Data: A New Era for Global Data Governance
Jul 25, 2025

Toward “AI-Ready” Development Data: A New Era for Global Data Governance

A New Turning Point for Development Data Amid rapid advances in artificial intelligence (AI), development data has reached a new pivotal stage: its evolution into AI-ready data, data that is easy to discover, understand, access, and use by both humans and AI systems. This transformation stems from a new reality: users, from beginners to experts, […]

Read Article
GlossAPI: Developing the Greek Data Set for Large Language Model Training
May 30, 2025

GlossAPI: Developing the Greek Data Set for Large Language Model Training

The rapid expansion of Large Language Models (LLMs) has created an unprecedented need for large-scale, high-quality, and linguistically coherent datasets. For the Greek language, rich in history, structure, and semantic nuance, this need is even more urgent due to its underrepresentation in mainstream AI development. In response, GlossAPI was launched in 2023 as an initiative […]

Read Article
Building AI-Ready Data, A Human-Centered Framework for the Age of Large Models
May 21, 2025

Building AI-Ready Data, A Human-Centered Framework for the Age of Large Models

Why data quality is ultimately about people Discussions about Artificial Intelligence often focus on algorithms and impressive demonstrations, but the true foundation of AI is quieter and more fundamental, data. The quality of data is not a purely technical concern, it is a deeply human one. Poor data can lead to unfair decisions, hidden risks, […]

Read Article
ALIA: Spain Releases Its AI Models Under Open Access
Apr 25, 2025

ALIA: Spain Releases Its AI Models Under Open Access

A Public AI Infrastructure for All Official Languages of Spain The Spanish government has officially released its ALIA AI models to the public, marking a major milestone in open, public-interest artificial intelligence. ALIA is Spain’s public AI infrastructure designed to support Spanish and all co-official languages, Catalan, Basque, and Galician. Six years after its inception, […]

Read Article
Pharos: The Greek AI Factory for Accelerating AI Innovation
Apr 2, 2025

Pharos: The Greek AI Factory for Accelerating AI Innovation

A European AI Factory anchored in Greece “Pharos: The Greek AI Factory for Accelerating AI Innovation” is an ambitious initiative that aims to establish an EU AI Factory in Greece, strategically leveraging the pre-exascale supercomputer Daedalus to accelerate AI-driven innovation. Its core mission is to democratize AI by providing accessible resources, tools, and expertise to […]

Read Article