Skip to content

DmitriiK/Anki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python modules for creation of custom dictionaries for learning of foreign languages and Anki decks

Objectives:

  • Given a list of of words or some text in specific language (let's call it 'Source language'), prepare materials for memorization of meanings of input words in 'Target' language, including examples of usages and media-files for these examples of usages. Final output is Anki decks.

What technologies are being used?

  • Python morphological Analyzer and Lemmatizer for Turkish language for lemmatization and frequency analysis : zeyrek
  • Open AI for translation and for preparation of usage examples: OpenAI
  • Langchain for formatting of input prompts and output for LLM-s: https://www.langchain.com/
  • Microsoft Azure Text-To-Speech API MS Azure Text-to-Speech
  • genanki: A Library for Generating Anki Decks: genakli
  • Anki applications (mobile, desktop and Web) Anki

Data sources:

Currently the secuqunce of executions for pipeline looks like this:

  • create_frequency_list(cfg.INPUT_CORPUS_FILE, cfg.FREQ_LST_FILE_PATH): reading of corpus texts and creation of frequency list
  • lemmatize_frequency_list_io(cfg.FREQ_LST_FILE_PATH, cfg.FREQ_LST_LM_FILE_PATH)(): lemmatization of the word from frequency list
  • group_by_lemma_io(ifp=cfg.FREQ_LST_LM_FILE_PATH, ofp=cfg.FREQ_LST_GR_FILE_PATH)() : grouping by lemma (main grammar form)
  • attach_frequencies_io(cfg.INPUT_WORDS_LIST_FILE, cfg.FREQ_LST_GR_FILE_PATH, cfg.WORDS_AND_FREQ_LIST_FILE)() : join of frequency list to input list of words
  • request_and_parse_by_chunks_io(inp=cfg.WORDS_AND_FREQ_LIST_FILE, outp = )() : calling to Open AI in order to translate the list of words and to prepare examples of usage
  • generate_audio_batch_from_file(cfg.OUTPUT_FILE_NAME, cfg.DIR_AUDIO_FILES) : calling to Text-To-Speech API on order to produce .mp3 files for the examples of usage from the previous steps
  • create-anki-deck.generate_deck(): creation of anki deck to study translations of words and examples of usage. Note: in order to leverage this for creation of Anki decks with multimedia they should be in the same directory, where main python file been launched..

Root executor for the sequence above is launcher module.
Module persistence_guy contains decorators with functions to input output data from/to files. Module pipelines chains decorators from module above and main functions together.

Anki decks contain:

  • words in some, let's say, source language, (for my case it is Turkish),
  • it's translations to target language( English)
  • the examples of usage of these words in both languages (underlined root part of word for source language and the whole word for target)
  • sound multimedia for the examples in source language
  • frequencies metrics for these words by some corpus of texts.
    You can easily configure the code to make similar decks for whatever pair of languages. You can use Anki decks in desktop, mobile, or web application. image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages