Project which uses API to grab a bunch of etsy stores and then uses a simple analysis to compare similarity and group similar stores.
There are several codes in this directory. The first one is getShops.py. This one should be run as:
$ python getShops.py 5000 shopListingData.json
The second code trains the vector scaler (multiplier) by going through the data. The training features are defined in the multiplier itself and are currently set to:
typesOfVectors = ["tags", "category_path", "materials"]
To use that code you must give it the listing data output by previous python script as well as two output files for the feature scaler (multiplier) as well as feature vectors for all stores (so we don't have to calculate this twice!):
$ python trainShops.py shopListingData.json multipliers.json vectors.json
The final code takes the multipliers and vectors as input and either selects a random store and outputs five similar stores or, given a store name (that is in the data set), finds 5 similar stores to it. Usage (randomStore):
$ python findSimilarShops.py multipliers.json vectors.json
Or (with known store name):
$ python findSimilarShops.py multipliers.json vectors.json VintageIngenue
In either case Output appears in standard out as (or can be > to a file):
$ OriginalStore: SimilarStore1, SimilarStore2, SimilarStore3, SimilarStore4, SimilarStore5
Another version of this code loops over all the stores and produces an output file with all the shops. To run it use:
$ python findSimilarShopsAllShops.py multipliers.json vectors.json similarShops.dat
I have included my shopListingData.json, multipliers.json, vectors.json,
and similarShops.dat
in this repository.
I have further included another script which calls the API to get store specific data about the stores in the shopListingData.json file. That routine calls the API and gets the shop info. It outputs the store data to a file called shopData.json. It can be called as follows:
$ python getShopsInfo.py shopListingData.json shopData.json
I have also written another routine which reduces the apparent vector distance between two shops if the comparison ship is more popular (has more favorers). The maximum bonus is about 10% of the vector distance with most shops getting almost no bonus at all. To run that script use:
$ python findSimilarShopsAllShopsByPopularity.py multipliers.json vectors.json storeInfo.json similarShopsByPopularity.dat
I have further included storeInfo.json,
and similarShopsByPopularity.dat
with this repository.