- v0.0.1: First version of tab search plugin
- v0.0.2: Initial version with cleanup popup UI
- v0.0.3: Refactored code, for ease of testing + Extendability (In Progress)
- v0.0.4: TBD
- Ensure that your virtual env is setup and activated (assuming you are at base of repo)
virtualenv venv
source venv/bin/activate
pip install -r serv/requirements.txt
- export PYTHONPATH=$PYTHONPATH:
pwd
- Ensure that you have fetched all files
- Create serv/res folder if one is not there, and get
meta_train_*.pkl
andembed_train_*.pkl
inside res- git lfs pull
- Create serv/res folder if one is not there, and get
- Please download encoder model from
https://drive.google.com/file/d/1JLTYMaCtY4pkl4oeygXVnk_GxJpOWxKH/view?usp=drive_link
and put it in the server folder.
-
Ensure that all files
model_file
,ort_format
file etc are available.- You gan git lfs pull the files
- Or you can get all files by doing
git lfs pull
-
Now run the program
- Do
python serv/test_algo.py
from base path, this should give you 90% accuracy - This will take about 10-12 hrs if you don't have embed_data and meta_file or your are reindexing, else about 1min
- Do
-
Install the plugin in chrome in developer mode. To do this see below
- How to install unpacked: https://developer.chrome.com/docs/extensions/mv3/getstarted/development-basics/#load-unpacked
- Do the above for the folder client/plugin
- Now enable the plugin for all sites, go to manage extensions, in select semrider and then in site access choose all sites
- Now your plugin is running
-
Running the server
- Activate virtual environment as mentioned in step 1 of Basic
- Now run the flask server program, with
python serv/server.py
- If the program successfully runs, you should see text of any site you visited being displayed as the output of previous step
- Now if you search in semrider plugin, it should return you results
- data/eval-100-samples.csv : 100 urls from 10 categories with phrases to match categories like llm-blog, tech blog etc. Used to check accuracy across categ
- data/confsbl-hn-url-gt-100.csv : These provide consfuable data to confuse the 15k YC news, to confuse above 100
- res/meta-train-v02.pkl : meta data for top 1k of confsbl + 100 evals
- res/embed-train-v02.pkl : embed data for top 1k of confsbl + 100 evals
- You can use the embed-train/meta-train as prod as well, just make a copy and call it embed-prod-v02 and meta-prod-v02.pkl