This repo hosts a short literature review on efficient data structures for representing n-gram language models. Copied from the abstract:
N-grams are used in a variety of tasks across fields such as Information Retrieval, Natural Language Processing, and Machine Learning. Handling large datasets of n-grams requires an efficient method of representing these n-grams, in terms of both memory usage and retrieval speed. In this literature review I undertake a brief survey of the current state of the art for efficient n-gram representations and language models, and then look at a trie-based method whose authors report to be competitive with the current state of the art in terms of both time and space complexity.