Releases: etlundquist/rankfm
Releases · etlundquist/rankfm
v0.2.5: Should Be Stable
Added
- working PyPI and GitHub
pip
installs on both OSX and Linux - wrapped the external Mersenne Twister C library to generate better random numbers for BPR/WARP training
- added a
MANIFEST.in
to include all C source and headers in thesdist
archive
Changed
- changed the logic in
setup.py
to favor building extensions from the generated C source rather than re-cythonizing the.pyx
files. This is best practice according to the Cython docs. - removed Cython as a formal dependency as the generated C code will be included in the package
sdist
from now on.
v0.2.3: Working OSX/Linux PyPI Distributions
Changed
- needed to instruct Python to compile the created
.c
file instead of the.pyx
file as the latter doesn't get added to thesdist
- build tested and working now on both OSX and Linux
v0.2.2: Struggling with PyPI
no changes, just syncing things up.
v0.2.0: Full-Blown Cython
Added
- Cython back-end for
_fit()
,_predict()
,_recommend()
- the Cython_fit()
function is 5X-10X faster than the original Numba version, andpredict()
/recommend()
are about the same speed.
Changed
- split
regularization
into two parameters:alpha
to control the L2 regularization for user/item indicators andbeta
to control the regularization for user-features/item-features. In testing user-features/item-features tended to have exploding gradients/overwhelm utility scores unless more strongly regularized, especially with fairly dense side features. Typicallybeta
should be set fairly high (e.g. 0.1) to avoid numerical instability.
v0.1.3: Speed-Ups & Bug Fixes
Changed
- pull the string
loss
param out of the private Numba internals and into the publicfit()
function - change
_init_interactions
to extend rather than replace theuser_items
dictionary item sets - added conditional logic to skip expensive user-feature/item-feature dot products if user and/or item features were not provided in the call to
fit()
. This reduces training time by over 50% if just using the base interaction matrix (no additional user/item features).
Fixed
- bug where
similar_users()
,similar_items()
were performing validation checks on the original ID versus the zero-based index (wrong) instead of original values (correct) - this was causes a bunch of bogus assertion errors saying that the item_id wasn't in the training set
v0.1.2: Adding WARP Loss
Added
- WARP loss - while slower to train this yields slightly better performance on dense interaction data and much better performance on highly sparse interaction data relative to BPR
- new hyperparameters
loss
andmax_samples
- re-wrote the numba
_fit()
function to elegantly (IMHO) handle both BPR and WARP loss
v0.1.1: Improvements and Bug Fixes
Added
- added support for sample weights - you can now pass importance weights in addition to interactions
- automatically determine the input data class (np.ndarray vs. pd.dataframe/pd.series)
- assert/ensure that all model weights are finite after each training epoch to fail fast for exploding weights
Fixed
- bug where pd.dataframe interactions with columns not named
[user_id, item_id]
were not getting loaded/indexed correctly - fixed by using the input class determination utility created
Changed
- more efficient loops for updating item feature and user/item feature factor weights - this cuts training time by around 30% with no auxiliary features, and by 50%+ in the presence of auxiliary features
v0.1.0: Initial Release
Added
- core package functionality
- example notebook:
quickstart.ipynb
- source distribution and package wheel
- basic test suite
- CircleCI build, lint, test CI workflows