Dialectal Arabic Datasets

About the data:

A datasets for four dialects, namely Egyptian (EGY), Levantine (LEV), Gulf (GLF), and Maghrebi (MGR) which are available at https://alt.qcri.org/resources/da_resources/. Each dataset consists of a sets of 350 manually segmented and POS tagged tweets.

Generate the splits

run the shell script generate_splits.sh with the dataset argument EGY, LEV, GLF or MGR (lower cased).

# call generate_splits.sh script with the right argument
generate_splits.sh lev

this will generate a directory "lev" contains five splits. each split with train, dev and test source and target file.

Reference:

Please cite [1] if you found the resources in this repository useful.

[1] Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki, Younes Samih, Randah Alharbi, Mohammed Attia, Walid Magdy and Laura Kallmeyer (2018) Multi-Dialect Arabic POS Tagging: A CRF Approach. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 7-12, 2018. Miyazaki, Japan.

@InProceedings{DARWISH18.562,
  author = {Kareem Darwish ,Hamdy Mubarak ,Ahmed Abdelali ,Mohamed Eldesouki ,Younes Samih ,Randah Alharbi ,Mohammed Attia ,Walid Magdy and Laura Kallmeyer},
  title = {Multi-Dialect Arabic POS Tagging: A CRF Approach},
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {may},
  date = {7-12},
  location = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  address = {Paris, France},
  isbn = {979-10-95546-00-9},
  language = {english}
  }

Version:

Mon May 07, 2018.

Contacts:

Ahmed Abdelali < aabdelali @ hbku dot edu dot qa >
Kareem Darwish < kdarwish @ hbku dot edu dot qa >
Hamdy Mubarak < hmubarak @ hbku dot edu dot qa >

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
generate_splits.sh		generate_splits.sh
seg_plus_pos_egy.txt		seg_plus_pos_egy.txt
seg_plus_pos_glf.txt		seg_plus_pos_glf.txt
seg_plus_pos_lev.txt		seg_plus_pos_lev.txt
seg_plus_pos_mgr.txt		seg_plus_pos_mgr.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialectal Arabic Datasets

About the data:

Generate the splits

Reference:

Version:

Contacts:

About

Releases

Packages

Languages

qcri/dialectal_arabic_resources

Folders and files

Latest commit

History

Repository files navigation

Dialectal Arabic Datasets

About the data:

Generate the splits

Reference:

Version:

Contacts:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages