Skip to content
/ myMDB Public

A simple CRUD application implementing a keyword search algorithm for querying from a database built using C++ in-house templates

Notifications You must be signed in to change notification settings

henrylao/myMDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Context Indexing Movie Search Engine

This program was written for DeAnza's CS301 Data Structures class in 2019 as a final project. Constraints required utilization of an in-house implementation of template data structures used in the development of the application. The application is a CLI interface for interacting with a database of movies using the database's search engine.

Data Structures Used

  • Singly Linked List
  • Stack
  • HashTable (Quadratic Probing)
  • Binary Search Tree

How It Works

Create

(1) Build the list of movies

  1. load each row denoting a movie entry and its attributes
  2. create the movie entry and add into the list collection

(2) Create the hashtabe of movies

  1. Use movie ID attribute as the key
  2. Preprocess the ID cleaning out any unwanted characters
  3. Add the movie into the database using the preprocessed key and the movie as the value

(3) Create the search engine

  • Using the list of movies, create a map of bsts.
    • NOTE Index of bst was maintained lexically.
  • BST entries containing keywords associated with a movie such that that the first character of each tokenized keyword pertaining to a movie signifies a key in the dictionary
    • NOTE: tokenized words are the movie attributes such as the movie name, genre, year released, runtime, and id
      • more attributes could have included the cast as well as production -- at the bare minimum there are atleast 5 attributes resulting in 5 keywords
      • this was the largest computational cost when spinning up the application since it uses a nested loop resulting in an average O(n^2) insertion time

Start Up Statistics (Small Dataset)

Process Time
List of Movies Generation Time 0.069353s
Movie Load From List to Table Time 0.0349641s
Tabled BST Search Engine Time 2.73182s

Movie Table Statistics

Metrics Values
Occupancy 507
Capacity 1277
Load Factor 39.7%
Number of Collisions 166

Tabled BST Search Engine Statistics

Metrics Values
Occupancy 41
Capacity 521
Load Factor 7.869%
Number of Collisions 0

Read

Reading movies based on relevance of a user's keyword search

  1. Preprocess the keyword(s) search entry into tokenized list
  2. Loop LV 1: For each starting character of a tokenized user entry get the associated BST at O(1) and search for the associated keyword BST node O(logn)
    • NOTE user keyword searches are usually short, this would have many issues if there were too many keywords
  3. Loop LV 2: For each keyword-node containing a list of movies, create a tally value of 1 for each movie associated keyword MovieID incrementing the tally for each existing movieID O(n^2)
  4. Sort movies based upon their tallied keyword relevance ie "how many keyword hits did each movie have?" and return list of movies sorted by relevance measured by the set of tokenized keywords of a user search entry

References:

About

A simple CRUD application implementing a keyword search algorithm for querying from a database built using C++ in-house templates

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages