TueSearch

This project contains the source code of the final project from the course modern search engines at the University of Tübingen.

Local set up for development

Tear down everything

./scripts/teardown.sh docker-compose.yml

Create output directories and initialize environment variables.

cp -rf example.env .env 
cp -rf example.frontend.env frontend/.env

Start the project locally

./scripts/startup.sh docker-compose.yml

Remote set up for deployment

Start the project on the server. Create external volumes

docker volume create prod_tuesearch_database
docker volume create prod_tuesearch

Change passwords in .env and start the containers with

./scripts/startup.sh prod.docker-compose.yml

Analog, tear down with

./scripts/teardown.sh prod.docker-compose.yml

And remove the external volumes (if needed) with

docker volume rm prod_tuesearch_database
docker volume rm prod_tuesearch

Crawler set up at local computer

Important note: when stop a crawler, stop gracefully so it has time to unreserve its reserved jobs.

Add the .env file from Discord to the root directory and start the crawler with

docker-compose -f docker-compose.yml up loop_worker

To start more than once crawler, do

docker-compose -f docker-compose.yml  up --build --scale loop_worker=2 loop_worker

Change the number 2 to the number of crawlers you want to start. Start slowly and increase the number of crawlers gracefully to see if everything works fine.

Be polite to other websites and use at most 4 crawlers at the same time to avoid overloading the crawled websites.

Frontend

Start mock up server

docker-compose -f docker-compose.yml up --build backend_mockup_server

and test the mock API at localhost:4001/search?q=tubingen

Install dependencies

npm install

Start the frontend

npm run dev

Open the browser at https://localhost:5000/

Quality check

Some regularly used SQL queries to check quality:

Test relevance ratio:

SELECT count(*) FROM `documents` where relevant = 1;
SELECT count(*) FROM `documents` where relevant = 0;

Update priority list:

SELECT j.url, j.priority from jobs as j join documents as d where j.id = d.job_id and d.relevant = 1;

Update block list:

SELECT j.url, j.priority from jobs as j join documents as d where j.id = d.job_id and d.relevant = 0;

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github		.github
backend		backend
crawler		crawler
docker		docker
frontend		frontend
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
example.env		example.env
example.frontend.env		example.frontend.env
prod.docker-compose.yml		prod.docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TueSearch

Table of Contents

Local set up for development

Remote set up for deployment

Crawler set up at local computer

Frontend

Quality check

Team Members

About

Releases 2

Packages

Contributors 4

Languages

License

TueSearch/search-engine

Folders and files

Latest commit

History

Repository files navigation

TueSearch

Table of Contents

Local set up for development

Remote set up for deployment

Crawler set up at local computer

Frontend

Quality check

Team Members

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages