AN INITIATIVE BY
GFOSS GFOSS
GlossAPI

AI ready data

Pipeline for processing Greek texts and converting them into ready-to-use datasets for Large Language Models.

Dataset Timeline

Tracking the evolution of our datasets and total token ingestion

All datasets are released under Creative Commons licenses
APR 2026

eellak-articles

25.10MB
8.5M Tokens
CC
APR 2026

opengov-deliberations-v2

357.71MB
111.4M Tokens
CC
APR 2026

e-nautilia

4.61MB
2.7M Tokens
CC
APR 2026

artos-zois

12.20MB
4.0M Tokens
CC
APR 2026

amna-press

1.48GB
158.2M Tokens
CC
APR 2026

ert-press

36.4MB
9.8M Tokens
CC
MAR 2026

modern-greek-dictionary

33MB
4.9M Tokens
CC
MAR 2026

istorima

416.02MB
138.9M Tokens
CC
JAN 2026

openbook.gr

251.63MB
133M Tokens
CC
JAN 2026

Greek PhD Theses Corpus

7.06GB
5.34B Tokens
CC
JUN 2025

eurlex-greek-legislation

2.21GB
604M Tokens
CC
APR 2025

ellinika_dedomena_europaikou_koinovouliou

1.09GB
273M Tokens
CC
APR 2025

Apothetirio_Kallipos

572MB
196M Tokens
CC
MAR 2025

Apothetirio_Pergamos

2.25GB
839M Tokens
CC
JAN 2025

1000_prwta_xronia_ellhnikhs

104MB
33M Tokens
CC
JAN 2025

Ekklisiastika_Keimena

16.7MB
6.5M Tokens
CC
DEC 2024

Wikisource_Greek_texts

116.3MB
38M Tokens
CC
DEC 2024

klasikh_arx_ell_grammateia

63.8MB
20.4M Tokens
CC
DEC 2024

Sxolika_vivlia

31.0MB
10.1M Tokens
CC
NOV 2024

Ellinika_Keimena_Project_Gutenberg

38.9MB
12.3M Tokens
CC
NOV 2024

95k_deigma_ellinikis

28.3MB
2.94M Tokens
CC
NOV 2024

dimodis_logotexnia

384KB
0.1M Tokens
CC

Growth Chart

Cumulative Token Volume

7.952.178.676
TOTAL TOKENS

We've got an entire team dedicated to this project

Want to collaborate or get involved? We love partnerships and new contributors.