Skip to content

A bilingual goal-oriented agent that can converse in Spanish–English code-switching with human users. An accompanying dataset.

Notifications You must be signed in to change notification settings

meryemmhamdi1/commonamigos

 
 

Repository files navigation

CommonAmigos

A bilingual goal-oriented agent that can converse in Spanish–English code-switching with human users. An accompanying dataset.

Ahn, E., Jimenez, C., Tsvetkov, Y., & Black, A. (2020). What Code-Switching Strategies are Effective in Dialogue Systems?. Proceedings of the Society for Computation in Linguistics, 3(1), 308-318. PDF

Email Emily at eahn [at] uw.edu with questions.

Data

The data/ folder contains 10 folders (each corresponding to 1 batch released on crowdsourcing platforms). Each folder contains the following 6 files:

  • chat.json: pure chats
  • surv.json: qualitative survey (key to questions given in file surv_questions_list.txt)
  • qual.tsv: qualitative survey in tsv form
  • lid.tsv: hand-annotated Language ID of each token from user.
    • {0 = SP, 1 = EN, 2 = neither}
    • NOTE: all tokens in these LID files have been lower-cased.
  • .html: visualization of chats and surveys

Process and Visualize

To begin processing and poking into the data with python, use methods defined in processing_tools.py, mainly load_all_data("data/files_list_com.txt"). Python file was written with Python 2.7 but should be compatible with Python 3.

To visualize all data according to Agent Strategy, see html files in viz_batches/. Chats are redundant to ones in data/*/*.html, but simply organized differently.

Miscellaneous Files

  • strategy_map.txt informs mapping of Agent strategy ("style") from the paper to these files.

    • We originally had a different naming convention of these strategies.
    • For example, Insertional EN > SP (Spanish as matrix language, with insertions of English, the embedded language) was "SP lex", under the broader strategy of "Content" code-switching.
  • data/bot_lid_tags.tsv and data/files_list_bot.txt are for provided if you want to account for the words generated by the agent. The second file can be loaded with the processing tools.

Code for Full Dialogue System

Emily's Spanish-English fork of the original English-only MutualFriends task is still under construction. Please reach out if you plan to use it! We have collaborators working on a Hindi-English extension of this system!

About

A bilingual goal-oriented agent that can converse in Spanish–English code-switching with human users. An accompanying dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 100.0%