Scaling Up Nearest Neighbor Search : How Dataset Size and Dimensionality Affect KNN Variants
-
Updated
Jun 14, 2024 - Jupyter Notebook
Scaling Up Nearest Neighbor Search : How Dataset Size and Dimensionality Affect KNN Variants
Implementacija algoritama predstavljenih na predmetu Analiza velikih skupova podataka (AVSP)
Explored Jaccard distance, Min-Hashing, and LSH for user similarity in a movie rating dataset. Tasks involve dataset preprocessing, exact Jaccard Similarity computation, Min-Hash signatures, and LSH implementation. Results and observations are documented in code, output files, and a report
A semantic search indexing system designed to efficiently retrieve top matching results from a database of 20 million documents. Given the embedding of a search query, it quickly identifies and returns the most relevant documents
The assignment comprises two main tasks: implementing LSH to identify similar businesses based on user ratings and developing various collaborative filtering recommendation systems to predict user ratings for businesses.
Locality Sensitive Hashing, fuzzy-hash, min-hash, simhash, aHash, pHash, dHash。基于 Hash值的图片相似度、文本相似度
Finding similar documents using LSH with MapReduce on multi-node Spark Cluster
A Robust Library in C# for Similarity Estimation
一个基于 fasttext + faiss 的商品内容相关推荐实现,nginx+uwsgi+flask / gunicorn+uvicorn+fastapi 提供api查询接口,增加Spark实现 Ansj+Word2vec+LSH+Phoenix
Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome
MDLE First Assignment - The objective of this project was to implement the A-Priori algorithm to obtain the most frequent itemsets for a list of conditions for a large set of patients, obtaining then associations between conditions by extracting some rules, and also to implement and apply LSH to identify similar news articles from a dataset.
Locality Sensitive Hashing in Rust with Python bindings
This repo aims to implement an modular engine for Locality-Sensitive Hashing (LSH).
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
Unnatural Language Processing
Search your object with hash
Lab assignments for the course ID2222-Data Mining at KTH
Coursera's Natural Language Processing specialization
Add a description, image, and links to the lsh-algorithm topic page so that developers can more easily learn about it.
To associate your repository with the lsh-algorithm topic, visit your repo's landing page and select "manage topics."