PADIC: A Parallel Arabic DIalect Corpus PADIC is composed of about 6400 sentences of dialects from both the Maghreb and the Middle-East. Each dialect has been aligned with Modern Standard Arabic (MSA). PADIC includes four dialects from Maghreb: two from Algeria, one from Tunisia, one from Morocco and two dialects from the Middle- East (Syria and Palestine). PADIC has been built from scratch by the members of SMarT research: Salima Harrat, Karima Meftouh and K. Smaïli. The translation of Tunisian has been done by Salma Jamoussi, Moroccan by Samia Haddouchi, Palestinian by Motaz Saad and Syrian by Charif Alchieekh Haydar.

Any use of PADIC shall include the following acknowledgement: “Programme material SMarT” and will use the following article for referencing it:

K. Meftouh, S. Harrat, M. Abbas, S. Jamoussi, and K. Smaïli, "Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus", PACLIC29, Shanghai, 2015

Note that in this paper, the moroccan dialect was not yet available. As soon as the paper with all the dialects will be published, we will inform you in SMarT website and we kindly ask you to reference the new article.

A new version of PADIC will be available in the next months, in which the Arabic dialects and MSA will be aligned with French and English.


This page may have a more recent version on pmwiki.org: PmWiki:Corpora, and a talk page: PmWiki:Corpora-Talk.