A Public AI Infrastructure for All Official Languages of Spain
The Spanish government has officially released its ALIA AI models to the public, marking a major milestone in open, public-interest artificial intelligence. ALIA is Spain’s public AI infrastructure designed to support Spanish and all co-official languages, Catalan, Basque, and Galician. Six years after its inception, the project now embodies the principle of “Public Money, Public Code.”
The open release of these models strengthens Spain’s commitment to linguistic diversity, technological sovereignty, and transparent digital innovation.
A Public and Open AI Initiative
The ALIA project is led by the Secretary of State for Digitalisation and Artificial Intelligence, with technical coordination by the Barcelona Supercomputing Centre, Centro Nacional de Supercomputación (BSC-CNS).
ALIA is a flagship component of the Artificial Intelligence Strategy 2024, which sets out clear objectives:
- developing open and transparent foundational and language models,
- ensuring ethical, trustworthy artificial intelligence,
- protecting fundamental rights, intellectual property, and personal data,
- and establishing a framework of best practices.
Spain intends to apply ALIA in two pilot projects:
- an internal Tax Agency chatbot to streamline administrative workflows and citizen services,
- and a primary care diagnostic tool leveraging advanced data analysis to detect heart failure.
The Released Models
At the “HispanIA 2040” event, Spain’s Prime Minister Pedro Sánchez announced the first batch of ALIA models, published under the Apache License 2.0 on HuggingFace.
1. ALIA-40B
A large transformer-based decoder-only model trained from scratch on 9.37 trillion tokens of highly curated data. The corpus covers 35 European languages and code.
2. Salamandra-7B and Salamandra-2B
Decoder-only models trained from scratch on 12.875 trillion tokens, spanning 35 European languages and code.
All training scripts and configuration files are open and available on GitHub.
- Salamandra-7B-Instruct and Salamandra-2B-Instruct: instruction-tuned with 276,000 instructions in English, Spanish, and Catalan.
3. mRoBERTa
A multilingual foundational model built on the RoBERTa architecture, pretrained from scratch on 12.8 TB of multilingual data—significantly larger than previous encoder-only models such as XLM-RoBERTa-base/large.
4. RoBERTa-ca
A Catalan-language foundational model leveraging vocabulary adaptation from mRoBERTa.
It is continually pretrained on 95 GB of high-quality Catalan-only text.
All models have been independently verified by the Spanish Artificial Intelligence Supervision Agency (AESIA).
Conclusion: A Landmark for Public AI in Europe
Through ALIA, Spain:
- invests in public-interest artificial intelligence,
- embraces and promotes linguistic diversity,
- advances European digital autonomy,
- and sets an international benchmark for open, state-level AI development.
ALIA demonstrates how governments can build transparent, open, multilingual AI infrastructures that serve both citizens and innovation.
—
Source of this article: interoperable-europe.ec.europa.eu