An unsupervised text simplification system.
Explore the docs »
Read Thesis
·
Report Bug
·
Request Feature
NovoLS is a lexical text simplification system which I constructed as part of my dissertation project. It is largely insipired by LightLS — an older lexcial text simplifier proposed by Prof. Dr. Goran Glavaš and Dr. Sanja Štajner in 2015. Both NovoLS and LightLS make use of GloVe word embeddings to find simplification candidates for complex words, which are then ranked on a number of different features. My thesis, in which I also developed a web front end for the system, can be found here.
-
Wikipedia 2014 + Gigaword 5 GloVe Embeddings
Note - After downloading glove.6B.zip, we used the 300d embeddings, however, any of the packaged embeddings can be used.
- Clone the repo
git clone https://https://github.com/Chrono4/NovoLS.git
- Run generation script within resources/embeddings to generate vector model
python gen_keyed_vectors.py <glove vector path>
- Run simplifier.py with complex sentence as argument
python simplifier.py "convoluted sentence to simplify"
- Will return list of complex words detected, alongside their simplification candidates and rankings
Results for 'convoluted' - [('complicated', 9), ('confusing', 6), ('tedious', 3), ('tangled', 0)]
Results for 'sentence' - [('prison', 0)]
Results for 'simplify' - [('simplified', 1), ('simpler', 2)]
See the open issues for a list of proposed features (and known issues).
If you have any questions or concerns, message me on LinkedIn or email me at [email protected].
Shout out to Prof. Dr. Goran Glavaš for answering questions I had about the project. My dissertation would not have been what it was without his help. For those interested, a minimal version of Prof. Glavaš and Štajner's LightLS system can be found here