SMarT team belongs to Loria teams and has been founded on December 2013
The abbreviation of smart can be read as below:
Speech Modelisation and Text
Statistical Machine Translation
Our objective, therefore, is to model statistically the written or the spoken language to apply it to machine translation or speech recognition.
The main objective of the SMarT team is to develop language representation models for machine translation and speech recognition systems.
This modeling involves the use of mathematical methods to identify, extract and propose associations between two or more languages for translation and speech recognition.
Languages are studied through monolingual, parallel or comparable corpora for low ressourced languages(Arab dialects), Arabic, French and English.
For the moment, we are restricted to these languages, but the fact that we can actually consider others is possible since that the modeling is based on statistics not in kind of linguistic study.
Mots-clés : Modélisation statistique du langage, traduction automatique, étude des langues peu dotées, modélisation de l’estimation de la qualité des systèmes, algorithmes évolutifs, traduction de la parole, fouille de corpus comparables.
Our Group Members
Kamel Smaïli (Professor, Université ode Lorraine)
Delphine HUBERT Université de Lorraine
Martine KUHLMANN CNRS
Salima Harrat (Maître des conférences)
Chiraz Latiri (Enseignant-Chercheur, université de Tunis)
Karima Meftouh (Maître des conférences Classe B à l’université Badji Mokhtar, Annaba)
Ameur Douib Université de Lorraine
Karima Abidi Université de lorraine
Amine Menacer Université de Lorraine
Fadi ghawanmeh co-supervision université de Oslo
The objective of TRAM is to show the feasibility of an automatic accompaniment of Arab vocal improvisation. The idea is to propose an automatic instrumental response to an Arab singer who executes a Mawwal (or Istikhbar). The originality of the project is to investigate an approach based on Machine Translation (MT) in studying the accompaniment of Arab vocal improvisation. This approach considers the mutual interaction between the singer and the instrumentalist as a question and answer: vocal sentence (question) and instrumental response (answer). In Machine translation, we need a parallel corpus composed of a source and a target language. The training process allows then to associate each phrase of the source sentence to its corresponding phrase in the target language. To deal with this project, we propose a consortium composed of experts in music and in machine translation and more generally on machine learning process. This project necessitates collecting data which will be a considerable resource for researchers and which will be provided freely to our research community. This bootstrapping project will probably help us to apply in the near future to H2020.
AMIS is an original project concerning the second call : Human Language Understanding, Grounding Language Learning.
PADIC: A Parallel Arabic DIalect Corpus PADIC is composed of about 6400 sentences of dialects from both the Maghreb and the Middle-East. Each dialect has been aligned with Modern Standard Arabic (MSA). PADIC includes four dialects from Maghreb: two from Algeria, one from Tunisia, one from Morocco and two dialects from the Middle- East (Syria and Palestine). PADIC has been built from scratch by the members of SMarT research: Salima Harrat, Karima Meftouh and K. Smaïli. The translation of Tunisian has been done by Salma Jamoussi, Moroccan by Samia Haddouchi, Palestinian by Motaz Saad and Syrian by Charif Alchieekh Haydar.
Any use of PADIC shall include the following acknowledgement: “Programme material SMarT” and will use the following article for referencing it:
K. Meftouh, S. Harrat, M. Abbas, S. Jamoussi, and K. Smaïli, “Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus”, PACLIC29, Shanghai, 2015
K. Meftouh, S Harrat, Kamel Smaïli, “PADIC: extension and new experiments” 7th International Conference on Advanced Technologies ICAT, Apr 2018, Antalya, Turkey. 7th International Conference on Advanced Technologies, 2018
Note that in this paper, the moroccan dialect was not yet available. As soon as the paper with all the dialects will be published, we will inform you in SMarT website and we kindly ask you to reference the new article.
A new version of PADIC will be available in the next months, in which the Arabic dialects and MSA will be aligned with French and English.
- A. Douib, D. Langlois and K. Smaïli “A Translation Evaluation Function based on Neural Network“, Schedae Informaticae Journal, 2017
- K. Abidi and K. Smaili ” How to match bilingual Tweets?“, Sixth International Conference on Natural Language Processing (NLP 2017), Volume Editor(s): David Wyld et al., 2017
- M.A. Menacer, O. Mella, D. Fohr, D.Jouvet, D.Langlois and K. Smaili” An enhanced automatic speech recognition system for Arabic“,EACL – The Third Arabic Natural Language Processing Workshop, 2017
- S. Harrat, K. Meftouh, M. Abbas and K. Smaïli “An Algerian dialect: Study and Resources“, In International Journal of Advanced Computer Science and Applications, Vol7, Issue 2, 2016
- A. Ben Romdhane, S. Jamoussi, A. Ben Hamadou and K. Smaili, “Phrase-Based Language Model in Statistical Machine Translation“, International Journal of Computational Linguistics and Applications, 2016
- F. Bahja, J. Di Martino, E.Ibn Elhaj, D. Aboutajdine, “A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets“, SpringerPlus, SpringerOpen, 2016
- S. Jaffali, S. Jamoussi, A. Ben Hamadou and K.Smaïli, “Grouping Like-Minded Users for Ratings’ Prediction“, Smart Innovation, Systems and Technologies, Volume 56 – Springer, 2016
- K. Abidi and K. Smaïli “Measuring the comparability of multilingual corpora extracted from Twitter and others“, The Tenth International Conference on Natural Language Processing, Croatia, 2016, will be published in Springer LNCS/LNAI series
- M.A. Menacer, A. Boumerdas, C. Zakaria and K. Smaili, “A new language model based on possibility theory“, Springer LNCS series, Lecture Notes in Computer Science, 2016
- A. Douib, D. Langlois and K. Smaïli “Genetic-based decoder for statistical machine translation“,Springer LNCS series, Lecture Notes in Computer Science, 2016
- M. Leszczuk, J. Derkacz, M. Grega, A. Kozbial, and K. Smaïli “Definition of Requirements for Accessing Multilingual Information and Opinions“, 10th International Conference on Multimedia and Network Information Systems – MISSI 2016, published in Advances Intelligent Systems and Computing, Springer
- C. Nasri, K. Smaïli and C. Latiri “Statistical Machine Translation Improvements based on Phrase Selection“, Recent Advances in Natural Language Processing, Hissar- Bulgarial, 2015
- S. Harrat, K. Meftouh, M. Abbas, S. Jamoussi, M. Saad and K. Smaïli, “Cross-dialectal Arabic Processing“, 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING, Springer 2015, ISBN 978-3-319-18110-3, Vol9041, PP 620-632.
- K. Meftouh, S. Harrat, M. Abbas, S. Jamoussi, and K. Smaïli, “Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus“, PACLIC29, Shanghai, 2015
- O. Lachhab, J. Di Martino, E. Ibn Elhaj, A. Hammouch, “A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion. SpringerPlus, SpringerOpen, 2015
- M. Chami, M. Immassi, J. Di Martino, “An architectural comparison of signal reconstruction algorithms from short-time Fourier transform magnitude spectra. International Journal of Speech Technology, Springer Verlag, 2015
- A. Ben Romdhane, S. Jamoussi, A. Ben Hamadou and K. Smaïli “Phrase-based Language Modeling for Statistical Machine Translation“, International Workshop on Spoken Language Translation, Lake Tahoe, USA, December, 2014
- S. Jaffali, S. Jamoussi, A. Ben Hamadou and K. Smaïli “Clustering and Classification of Like-Minded People from their Tweets“, COOL-SNA Workshop on Connecting Online and Offline Social Network Analysis’ of the IEEE International Conference on Data Mining (ICDM’14), Shnezn, China, 17-19 december, 2014
- M. Saad, D. Langlois and K. Smaïli “Cross-Lingual Semantic Similarity Measure for Comparable Articles“, 9th International Conference on Natural Language Processing – PolTAL 2014, Warsaw, Poland, 17-19 September 2014
- S. Harrat, M. Abbas, K. Meftouh and K. Smaïli “Building Resources for Algerian Arabic Dialects“, 15th Annual Conference of the International Communication Association (Interspeech), Singapour, 14-18 September 2014
- S. Harrat, M. Abbas, K. Meftouh and K. Smaïli “Grapheme To Phoneme Conversion – An Arabic Dialect Case“, The 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU’14), St Petersburg, Russia, 2014
- M. Saad, D. Langlois and K. Smaïli “Building and Modelling Multilingual Subjective Corpora“, The 9th edition of the Language Resources and Evaluation Conference, 26-31 May, Reykjavik, Iceland, 2014
- C. Nasri, K. Smaïli and C. latiri “Training Phrase-Based SMT without Explicit Word Alignment“, 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), Volume 8404 of the series Lecture Notes in Computer Science, 2014
- S. Harrat, M. Abbas, K. Meftouh and K. Smaïli “Diacritics Restoration for Arabic Dialects“, 14th Annual Conference of the International Speech Communication Association – Interspeech, Lyon, France, 2013
- D. Jouvet, D. Langlois “A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription“, Text, Speech and Dialogue, 2013
- M. Saad, D. Langlois and K. Smaïli “Extracting Comparable Articles from Wikipedia and Measuring their Comparabilities“, Vth International Conference on Corpus Linguistics, Procedia, Social and Behavorial Sciences, Elsevier, 2013
- M. Saad, D. Langlois and K. Smaïli “Comparing Multilingual Comparable Articles Based on Opinions“, BUCC, 6th Workshop on Building and Using Comparable Corpora Co-located with ACL 2013, Sophia, Bulgaria, 2013
- D. Langlois, K. Smaïli “LORIA System for the WMT13 Quality Estimation Shared Task“, ACL eighth workshop on Statistical machine translation- Quality estimation task, Sophia, Bulgaria, 2013
- F. Bahja, J. Di Martino, E. Ibn Elhaj, D. Aboutajdine, “An overview of the CATE algorithms for real-time pitch determination“, Signal, Image and Video Processing, Springer Verlag, 2013
- C. Nasri, K. Smaïli, C. Latiri and Y. Slimani “A new method for learning Phrase Based Machine Translation with Multivariate Mutual Information“,The 8th International Conference on Natural Language Processing and Knowledge Engineering – NLP-KE’12, HuangShan : China (2012)
- K. Meftouh, N. Bouchemal, K. Smaïli “A Study of a Non-Resourced Language: The Case of one of the Algerian Dialects“, The Third International Workshop on Spoken Languages Technologies for Under-resourced Language, Cape-Town, South-Africa, May, 2012
- D. Langlois, S. Raybaud, K. Smaïli “ LORIA System for the WMT12 Quality Estimation Shared Task“, The seventh Workshop on Statistical Machine Translation, WMT12, Montreal, Canada, May, 2012
- C. Nasri, K. Smaïli, C. Latiri, “Training Statistical Machine Translation with Multivariate Mutual Information“, 5th Language and Technology Conference, Poznan, Poland, November, 2011
- S. Raybaud, D. Langlois and K. Smaïli “Broadcast news speech-to-text translation experiments” XIII MT SUMMIT organised by the international Association for machine translation Xiamen, China, Sept 2011
- C. Latiri, K. Smaïli, C. Lavecchia, C. Nasri and D. Langlois “Phrase-based machine translation based on text mining and statistical language modeling techniques“, CICLING, Tokyo, Feb 2011
- S. Raybaud, D. Langlois and K. Smaïli “This sentence is wrong. Detecting Errors in machine-translated sentences“, Machine Translation, Vol25, N°1, PP1-34, 2011
- M. Abbas, K. Smaili and D. Berkani “Evaluation of Topic Identification Methods on Arabic Corpora“, Journal of Digital Information, Vol 9, N5, Oct, 2011
- C. Gillot, C. Cerisara, D. Langlois, JP. Haton “Similar n-gram language model“, InterSpeech, 2010
- R. Jourani, D. Langlois, K. Smaïli, K. Daoudi, D. Aboutajidine “Cleaning statistical language models“, In 3d. International Conference on Information Systems and Economic Intelligence (SIIE’2010) Sousse, Tunisia, 2010.
- C. Latiri, K. Smaïli, C. Lavecchia and D. Langlois, “Mining Monolingual and Bilingual Corpora“, Intelligent Data Analysis, Volume 14(6), PP6 63-682, November 2010
- K. Meftouh, M.T. Laskri and K. Smaïli, “Modeling Arabic Language Using Statistical methods“, The Arabian Journal of Science and Engineering, Vol 35, Number 2C, Dec 2010
- F. Bahja, J. Di Martino, E. Ibn Elhaj, D. Aboutajdine, “An improvement of the eCATE algorithm for F0 detection. 10th International Symposium on Communications and Information Technologies – ISCIT 2010, Oct 2010, Tokyo, Japan