GitHub

Emmio is an experimental project on languages and learning. It provides learning and testing algorithms:

learning system based on spaced repetition,
lexicon (vocabulary) level checking,

and manages four kinds of artifacts:

dictionaries,
sentence translations,
frequency and word lists,
audio for words and sentences.

Installation

Requires Python 3.10.

pip install git+https://github.com/enzet/emmio

Get started

To run Emmio, just run

emmio

You may specify data directory with --data option and username with --user option. If not specified data directory is assumed to be ~/.emmio and username is assumed to be the current system username.

Lexicon

> lexicon

The algorithm will randomly (based on frequency) offer you words of the target language. For each word you have to decide

either you know at least one meaning of this word (press y or Enter),
or you don't know any meaning of this word (press n),
or the word is often used as a proper name or doesn't exist at all (press -).

To finish press q.

After that algorithm will provide you a non-negative number called rate, that somehow describes your vocabulary. 0 means you know not a single word of the language and infinity means you know absolutely all words in the frequency list. The better use of the rate is to track your language learning progress and to compare vocabulary of different people using one frequency list.

Rate	Level
near 3	Beginner, elementary
near 5	Intermediate, upper intermediate
near 7	Advanced, proficient
more than 9	Native

Lexicon configuration:

"lexicon": {
    "<language>": "<frequency list id>",
    ...
}

language is 2-letters ISO 639-1 language code (e.g. en for English and ru for Russian).
frequency list id is an identifier of full frequency file. Important: for Lexicon you can use only full (not stripped) frequency list.

Wiktionary

Wiktionary project contains frequency lists for different languages.

Data directory structure

Emmio data directory is located by default in ~/.emmio and contains all downloaded artifacts and their configuration files and collected user data.

dictionaries — single word translations.
sentences — whole sentence translations.
lists — frequency lists and simple word lists.
users — user data.
- <user name>
  - config.json — user configuration file.
  - learn — user learning process data.
  - lexicon — user lexicon checking data.

Dictionaries

Dictionaries are entities that provide definitions and translations for single words. Artifacts are controlled by configuration file dictionaries/config.json.

Emmio supports:

dictionaries stored in JSON files,
English Wiktionary (through WiktionaryParser).

Frequency lists and word lists

Frequency list is a relation between unique words and the number of their occurrences in some text of a corpus of texts. Some frequency lists are stripped (e.g. 6,500-lemma list based on the New Corpus for Ireland).

FrequencyWords (Opensubtitles)

There is Hermit Dave's project FrequencyWords, which contains full and stripped frequency lists extracted from subtitles in Opensubtitles project.

Name		Name	Last commit message	Last commit date
Latest commit History 682 Commits
.github/workflows		.github/workflows
.idea/dictionaries		.idea/dictionaries
c		c
doc		doc
emmio		emmio
go		go
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Get started

Lexicon

Wiktionary

Data directory structure

Dictionaries

Frequency lists and word lists

FrequencyWords (Opensubtitles)

About

Releases

Packages

Languages

License

enzet/Emmio

Folders and files

Latest commit

History

Repository files navigation

Installation

Get started

Lexicon

Wiktionary

Data directory structure

Dictionaries

Frequency lists and word lists

FrequencyWords (Opensubtitles)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages