Searcher is a study project with the objective of using VSM (Vector Space Model) to search documents and reveal the similarity between the document and the query.
val documentTable = new DocumentTable()
documentTable.pushText("Hello World")
documentTable.pushText("Hello Guys")
documentTable.pushText("Hello Main")
documentTable.pushText("Hello Man Guys World")
documentTable.pushText("Hello hello")
documentTable.pushText("Hello guys guys hello hello guys hello")
documentTable.pushQuery("Hello Guys")
documentTable.result().show()
Id Similarity
5 0.8164965809277259
1 0.6666666666666667
6 0.8082903768654761
2 1.0000000000000002
3 0.6666666666666667
4 0.8660254037844387
In the first step, we process the input by removing accents and punctuations and converting all letters to lowercase. Immediately after that, we index the sentences into an Inverted Index.