-
Notifications
You must be signed in to change notification settings - Fork 8
dgryski/interesting-papers
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a list of papers I would like to implement, or would like to have an implementation of. This list is likely to change as my interests change, including deletions. Do not expect this list to remain static. These are in no particular order: SONIK: Efficient In-situ All Item Rank Generation using Bit Operations - https://arxiv.org/abs/1605.06992 CAMP: A Cost Adaptive Multi-Queue Eviction Policy for Key-Value Stores - https://dblab.usc.edu/users/papers/CAMPTR.pdf SimString: A fast and simple algorithm for approximate string matching/retrieval https://www.chokkan.org/software/simstring/ Simpira: cryptographic permutations designed to be fast on modern 64-bit processors, yet provide a comfortable security margin against all currently-known attacks. - https://mouha.be/simpira/ Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives - https://arxiv.org/abs/1705.03934 Adaptive Cuckoo-Filters - https://arxiv.org/abs/1704.06818 Continuous Top-k Queries over Real-Time Web Streams - https://arxiv.org/abs/1610.06500 A practical index for approximate dictionary matching with few mismatches - https://arxiv.org/abs/1501.04948 Robust benchmarking in noisy environments - https://arxiv.org/abs/1608.04295 Fast intersection of sorted lists with SSE: - https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/ - Also, https://arxiv.org/abs/1401.6399 PAD: Performance Anomaly Detection in Multi-Server Distributed Systems https://www.microsoft.com/en-us/research/wp-content/uploads/2014/06/PAD-Performance-Anomaly-Detection-in-Multi-Server-Distributed-Systems.pdf Detecting Abnormal Machine Characteristics in Cloud Infrastructures - https://ti.arc.nasa.gov/publications/4268/download/ PerfAugur: Robust Diagnostics for Performance Anomalies in Cloud Services - https://www.microsoft.com/en-us/research/publication/perfaugur-robust-diagnostics-for-performance-anomalies-in-cloud-services/ Statistical Techniques for Online Anomaly Detection in Data Centers - https://www.hpl.hp.com/techreports/2011/HPL-2011-8.pdf Fast table-driven base64 encoding/decoding: - https://github.com/powturbo/TurboBase64/blob/master/turbob64d.c Assembly versions of hash functions / cryptographic algorithms: - t1ha (Go version: https://github.com/dgryski/go-t1ha ) - rc5 / rc6 (Go version: https://github.com/dgryski/go-rc5 / https://github.com/dgryski/go-rc6 ) In-memory data layout for Netflix's Hollow: - https://hollow.how/advanced-topics/#in-memory-data-layout Omnisearch Index Formats - https://blog.twitter.com/2016/omnisearch-index-formats NORX8 and NORX16: Authenticated Encryption for Low-End Systems - https://eprint.iacr.org/2015/1154 LightMAC: A MAC Mode for Lightweight Block Ciphers: - https://eprint.iacr.org/2016/190.pdf Fast Deterministic Selection (adaptive QuickSelect) - https://arxiv.org/abs/1606.00484 A Bloom filter based semi-index on q-grams - https://arxiv.org/abs/1507.02989 Faster Population Counts using AVX2 Instructions - https://arxiv.org/abs/1611.07612 Quasi-Succinct Indices (compressed inverted indexes): - https://vigna.di.unimi.it/ftp/papers/QuasiSuccinctIndices.pdf Efficient Summing over Sliding Windows (stream statistics) - https://arxiv.org/pdf/1604.02450v1.pdf A Novel Technique for Long-Term Anomaly Detection in the Cloud - https://www.usenix.org/system/files/conference/hotcloud14/hotcloud14-vallis.pdf - Twitter's anomaly detection algorithm - related, https://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/ - related, https://nerds.airbnb.com/anomaly-detection/ TinySet - An Access Efficient Self Adjusting Bloom Filter Construction - https://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2015/CS/CS-2015-03.pdf Detecting Change in Data Streams: - https://cs.uwaterloo.ca/~shai/vldb04.pdf Hierarchical Delta Debugging: - https://blog.acolyer.org/2015/11/17/hierarchical-delta-debugging/ - (to go with https://github.com/dgryski/go-ddmin ) FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space - https://cs.fit.edu/~pkc/papers/tdm04.pdf - many implementations to use as base, for example https://github.com/slaypni/fastdtw/blob/master/fastdtw.py Mining frequent items in the time fading model - https://arxiv.org/pdf/1601.03892v1.pdf Hierarchical Agglomerative Clustering: - https://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html - needed for https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/rebucket-icse2012.pdf - preliminary implementation of rebucket: https://github.com/dgryski/go-rebucket Balanced Allocation: Patience is not a Virtue (FirstDiff load balancing): - https://arxiv.org/abs/1602.08298 Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream - https://www.cs.ubc.ca/~xujian/paper/quant.pdf The Eternal Sunshine of the Sketch Data Structure - https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.2889&rep=rep1&type=pdf Copysets and Chainsets: A Better Way to Replicate https://hackingdistributed.com/2014/02/14/chainsets/ A Fast Algorithm for Approximate Quantiles in High Speed Data Streams - https://web.cs.ucla.edu/~weiwang/paper/SSDBM07_2.pdf - this algorithm has haunted me for ages, I could never get my code working - unresponsive authors, details missing from papers, etc - there now appear to be more implementations that could be used as a base
About
Interesting papers I'd like to implement (or at least have implementations of)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published