Releases · etlundquist/rankfm · GitHub

13 Jun 22:07

v0.2.5: Should Be Stable Latest

Added

working PyPI and GitHub pip installs on both OSX and Linux
wrapped the external Mersenne Twister C library to generate better random numbers for BPR/WARP training
added a MANIFEST.in to include all C source and headers in the sdist archive

Changed

changed the logic in setup.py to favor building extensions from the generated C source rather than re-cythonizing the .pyx files. This is best practice according to the Cython docs.
removed Cython as a formal dependency as the generated C code will be included in the package sdist from now on.

Assets 2

11 Jun 15:58

v0.2.3: Working OSX/Linux PyPI Distributions

Changed

needed to instruct Python to compile the created .c file instead of the .pyx file as the latter doesn't get added to the sdist
build tested and working now on both OSX and Linux

Assets 2

11 Jun 05:49

v0.2.2: Struggling with PyPI

no changes, just syncing things up.

Assets 2

11 Jun 05:13

v0.2.0: Full-Blown Cython

Added

Cython back-end for _fit(), _predict(), _recommend() - the Cython _fit() function is 5X-10X faster than the original Numba version, and predict()/recommend() are about the same speed.

Changed

split regularization into two parameters: alpha to control the L2 regularization for user/item indicators and beta to control the regularization for user-features/item-features. In testing user-features/item-features tended to have exploding gradients/overwhelm utility scores unless more strongly regularized, especially with fairly dense side features. Typically beta should be set fairly high (e.g. 0.1) to avoid numerical instability.

Assets 2

04 Jun 05:12

v0.1.3: Speed-Ups & Bug Fixes

Changed

pull the string loss param out of the private Numba internals and into the public fit() function
change _init_interactions to extend rather than replace the user_items dictionary item sets
added conditional logic to skip expensive user-feature/item-feature dot products if user and/or item features were not provided in the call to fit(). This reduces training time by over 50% if just using the base interaction matrix (no additional user/item features).

Fixed

bug where similar_users(), similar_items() were performing validation checks on the original ID versus the zero-based index (wrong) instead of original values (correct) - this was causes a bunch of bogus assertion errors saying that the item_id wasn't in the training set

Assets 2

30 May 18:55

v0.1.2: Adding WARP Loss

Added

WARP loss - while slower to train this yields slightly better performance on dense interaction data and much better performance on highly sparse interaction data relative to BPR
new hyperparameters loss and max_samples
re-wrote the numba _fit() function to elegantly (IMHO) handle both BPR and WARP loss

Assets 2

29 May 22:33

v0.1.1: Improvements and Bug Fixes

Added

added support for sample weights - you can now pass importance weights in addition to interactions
automatically determine the input data class (np.ndarray vs. pd.dataframe/pd.series)
assert/ensure that all model weights are finite after each training epoch to fail fast for exploding weights

Fixed

bug where pd.dataframe interactions with columns not named [user_id, item_id] were not getting loaded/indexed correctly - fixed by using the input class determination utility created

Changed

more efficient loops for updating item feature and user/item feature factor weights - this cuts training time by around 30% with no auxiliary features, and by 50%+ in the presence of auxiliary features

Assets 2

28 May 04:25

v0.1.0: Initial Release

Added

core package functionality
example notebook: quickstart.ipynb
source distribution and package wheel
basic test suite
CircleCI build, lint, test CI workflows

Assets 2