FUTURE is a search engine that improves on traditional methods of search by keyword by instead relying on machine learning techniques to encode words as vectors, and capture their meaning and be able to return more precise matches, all while dropping user tracking as only the query is sufficient to retrieve meaningful data. It was written with Python for backend, using Tensorflow and PyTorch and web technologies for the frontend.
FUTURE IS DISTRIBUTED UNDER THE GNU GPL v3
In order to get FUTURE working, first you will need to install the appropiate tensorflow and pytorch packages for your system. After that, it is only necessary that you run the following commands, which have been tested on Arch Linux, Open SuSe and Ubuntu:
chmod +x bootstrap.sh
./bootstrap.sh
The last command will never finish in a feasible amount of time, as it is building the index. However it can always be paused with CTRL+C and resumed later. Shell scripts to automate tasks are provided and are aptly named.
Pause the crawler with CTRL+C, and execute:
./save_index.sh
Finally, start the server, and point your browser to 0.0.0.0:3000 with the command below:
./future.py
Out of the box, FUTURE is designed as a web search engine, which means that running the ./bootstrap.sh
script provided will only prepare it to search web pages. However, it is hackable down to the core, therefore, you can open indexer.py
and tinker with it to save other types of data into the LMDB database, or perhaps refer to the Monad
class on the Monad.py
and write the files to handle the creation of the database and the index yourself.
If you were to modify the data that is saved into the database, you may also need to change how it is served in an HTML template, and for that refer to the lines 240-327 of future.py
, where you can adapt the code that manages the database to whatever suits your needs.
For further modifications, feel free to fork the project, but bear in mind the terms of the GPL v3 license.
Below are listed all the projects upon which FUTURE rests.
Name | License |
---|---|
Tensorflow | Apache 2.0 |
Flask | BSD 3-Clause |
Flask_login | MIT |
Werkzeug | BSD 3-Clause |
Flask_scrypt | MIT |
Flask_Mail | BSD License |
MongoDB | Server Side Public License |
MongoDB Python bindings | Apache 2.0 |
SymSpell | MIT |
Polyglot | GPL v3 |
Beautifulsoup | BSD 2-Clause |
BSON Python bindings | Apache 2.0 |
NumPy | BSD 3-Clause |
GeoPy | MIT |
SciKit Learn | BSD 3-Clause |
Pandas | BSD 3-Clause |
PyTorch | BSD 3-Clause |
Gensim | LGPL 2.1 |
NLTK | Apache 2.0 |
Scrapy | BSD License |
H5PY | BSD 3-Clause |
LMBD | OpenLDAP |
LMBD Python bindings | OpenLDAP |
tldextract | BSD 3-Clause |
Python Imaging Library (PIL) | PIL License |
COCO API (Python bindings) | BSD 2-Clause |
WTForms | BSD 3-Clause |
Flask_wtf | BSD 3-Clause |
HNSWLib | Apache 2.0 |
JQuery | MIT |
JQuery UI | MIT |
Particles JS | MIT |
Simplebar | MIT |
Ionicons | MIT |
Source Sans Pro | OFL 1.1 |
GloVe | Apache 2.0 |
SPARQLWrapper | W3C License |
TextScrambler | BSD-like |
NMT with Attention | Apache 2.0 |
Transformer Chatbot | Apache 2.0 |
Image Captioning | MIT |