Skip to content

capjamesg/nanosearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nanosearch

Nanosearch is an in-memory search engine designed for small (< 10,000 URL) websites.

With Nanosearch, you can build a search engine in a few lines of code.

Nanosearch supports the BM25 and TF/IDF algorithms.

Nanosearch also computes a link graph and uses the number of inlinks to a page as a ranking factor. This is useful for ranking results for queries where there are multiple relevant pages by keyword.

Installation

pip install nanosearch

Quickstart

Build a Search Engine from a Sitemap

from nanosearch import NanoSearchBM25

engine = NanoSearchBM25().from_sitemap(
    "https://jamesg.blog/sitemap.xml",
    title_transforms=[lambda x: x.split("|")[0]]
)
results = engine.search("coffee")

print(results)

Build a Search Engine from a List of URLs

from nanosearch import NanoSearchBM25

urls = [
    "https://jamesg.blog/",
    "https://jamesg.blog/coffee",
]

engine = NanoSearchBM25().from_urls(urls)
results = engine.search("coffee")

print(results)

Save an Index to Disk

You can save an index to disk and load it later with:

engine.to_nanosearch_json("index.json")

engine = NanoSearchBM25().from_nanosearch_json("index.json")

Supported Algorithms

Nanosearch supports the following search algorithms:

  • TF/IDF
  • BM25

License

This project is licensed under an MIT license.