Skip to content
This repository has been archived by the owner on Aug 15, 2023. It is now read-only.

TueSearch/search-engine

Repository files navigation

TueSearch

This project contains the source code of the final project from the course modern search engines at the University of Tübingen.

Table of Contents

Local set up for development

  1. Tear down everything
./scripts/teardown.sh docker-compose.yml
  1. Create output directories and initialize environment variables.
cp -rf example.env .env 
cp -rf example.frontend.env frontend/.env
  1. Start the project locally
./scripts/startup.sh docker-compose.yml

Remote set up for deployment

  1. Start the project on the server. Create external volumes
docker volume create prod_tuesearch_database
docker volume create prod_tuesearch

Change passwords in .env and start the containers with

./scripts/startup.sh prod.docker-compose.yml

Analog, tear down with

./scripts/teardown.sh prod.docker-compose.yml

And remove the external volumes (if needed) with

docker volume rm prod_tuesearch_database
docker volume rm prod_tuesearch

Crawler set up at local computer

Important note: when stop a crawler, stop gracefully so it has time to unreserve its reserved jobs.

  1. Add the .env file from Discord to the root directory and start the crawler with
docker-compose -f docker-compose.yml up loop_worker
  1. To start more than once crawler, do
docker-compose -f docker-compose.yml  up --build --scale loop_worker=2 loop_worker

Change the number 2 to the number of crawlers you want to start. Start slowly and increase the number of crawlers gracefully to see if everything works fine.

Be polite to other websites and use at most 4 crawlers at the same time to avoid overloading the crawled websites.

Frontend

  1. Start mock up server
docker-compose -f docker-compose.yml up --build backend_mockup_server

and test the mock API at localhost:4001/search?q=tubingen

  1. Install dependencies
npm install
  1. Start the frontend
npm run dev
  1. Open the browser at https://localhost:5000/

Quality check

Some regularly used SQL queries to check quality:

  • Test relevance ratio:
SELECT count(*) FROM `documents` where relevant = 1;
SELECT count(*) FROM `documents` where relevant = 0;
  • Update priority list:
SELECT j.url, j.priority from jobs as j join documents as d where j.id = d.job_id and d.relevant = 1;
  • Update block list:
SELECT j.url, j.priority from jobs as j join documents as d where j.id = d.job_id and d.relevant = 0;

Team Members