SuffixArray

Truncated SuffixArray based substring search index algorithm written in c++ and exposed through cython. Typical query times range from microseconds to milliseconds even on tens or hundreds of millions of items. **More comprehensive benchmarks to come as package get's more fleshed out.

To install

make install

Document Indexer

from suffix_array import SuffixArray

docs = [
    "The quick brown fox jumps over the lazy dog",
    "I am going to the store to buy some milk",
    "Uhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh"
]

## Creates and builds the index from documents.
suffix_array = SuffixArray(documents=docs, max_suffix_length=32)

## Query the index.
## Returns the documents that contain the query substring in a list.
records = suffix_array.query_records("the quick brown fox")

CSV Indexer

Indexes and memory maps the file. Keeps only the suffix arrays for the text in memory (4 * N) where N = num_text_chars in search column.

from suffix_array import SuffixArray

## Creates and builds the index from csv
CSV_FILE      = "company_data.csv"
SEARCH_COLUMN = "company_name"

suffix_array = SuffixArray(
    csv_file=CSV_FILE,
    search_column=SEARCH_COLUMN,
    max_suffix_length=32
)

## Query the index.
## Returns the documents that contain the query substring in a list.
## (Returns all columns in list of dictionary (json records) format.)
records = suffix_array.query_records("netflix")

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
bin/release		bin/release
searchapp_demo		searchapp_demo
suffix_array		suffix_array
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
compile_commands.json		compile_commands.json
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SuffixArray

Document Indexer

CSV Indexer

About

Releases

Packages

Languages

License

jdm365/SuffixArray

Folders and files

Latest commit

History

Repository files navigation

SuffixArray

Document Indexer

CSV Indexer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages