-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid model have lower Precision@K compare to pure CF #486
Comments
I got similar results to yours. While AUC gives similar results between pure and hybrid models, precision@10 and recall@10 show much better performance on pure CF than Hybrid CF (up to +50%!). In my case, I'm only using just a few item_features gathered from catalogue (category, item gender, item color, etc). The strange thing is that when I'm using the item representation to calculate the item similarities, the hybrid model easily outperforms the pure CF one in this task (similarities make more sense and produce higher CTR% when deployed online). Given the latter result, I was also expecting much higher performance on user suggestion task. |
@FrancescoI may I ask what do you use to evaluate the performance of the item similarities model? |
@kientt15vinid, we've AB tested both the models online (a carousel of "similar items" to the main one in item page), using CTR % (click through rate: number of clicks / number of impressions) as primary KPIs. |
@FrancescoI In that case, CTR might be motivated by the desire to explore different variances of an item. I suggest you could use a pure content-based model as baseline. If the hybrid model out-play both pure CF and pure Content-based, then it is something worth noting. |
@kientt15vinid , yeah, CB was the baseline even before switching to CF! By switching from CB to CF we achieved tremendous uplift in CTR (up to +47%). I haven’t mentioned because I was focused on pure vs hybrid CF comparison :) |
We get similar results in our initial experiments, adding item meta-data (and keeping the identity matrix from pure CF) leads to worse MRR and P@10 than a pure CF Model. We'll keep investigating and report back. It would also be interesting to here back from the others @FrancescoI , @kientt15vinid |
The culprit here might be that the embeddings for all features of a given item are simply summed to get the final item embedding: the model does not seem to be great at learning which features are important and which are not. In a more flexible formulation you may want to concatenate the embeddings of different features to get your embedding vector. This should make it more straightforward for the model to simply discard some features. Experimenting with different weights for different types of features might give you a lever to optimize this. |
@maciejkula, could you expand your thought on this? So far in my experimentations, I'm using the norm of the feature vector in the embedded space as signal that the model is actually learning something useful from the feature itself. For instance item gender (in fashion industry) is arguably the most important item metadata, it has the greatest vector norm and it really helps separate mixed-gender items in the final product embedding. On the contrary when a new feature vector norm is small, since its marginal contribution to the product embedding is negligible, the feature may be dropped. Does it make sense to you? |
I'm still struggling to find robust evidence since during my experimentations model performances were really sensitive to small changes in the hyperparameters and pure CF vs CF+metadata needed really different configurations to reach their highest performance. In the end we decided to keep using the metadata: while we haven't proved it to be best model among all the configurations we tested, we found it the best model to produce reliable item similarity over time. |
@FrancescoI the norm sounds like a very good approximation to feature importance. My hope was that this would work reliably, but I think in practice it's not always the case. It's possible that L2 regularization is really important to make sure that rare features are pushed to zero norm so as to not introduce noise. |
@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers? |
Every item embedding is the sum of itself plus its metadata embedding. There's a method in the lightfm object instance that automatically performs this calculation. Than you just need to pick an appropriate distance measure to retrieve the nearest items to each other: if I'm not wrong the package documentation has a nice example also for this use case. |
@FrancescoI I'm trying to find the method that you're referring to where it sums up the metadata embedding. Would you be able to point me in to the function you're referring to? Would really help me out. Thanks |
Hi Maciej,
I'm testing LightFM for my recommendation system in e-commerce of grocery products (everything you could buy in a convenient store). I've tested LightFM hybrid to pure collarborative filtering (also LightFM, just without users and item features) and got smaller preciesion@10. I've read your paper and it seems to point out that hybrid model should outperform pure CF model, but my experiment get the opposite results.
Here are the descriptions of my approach:
Dataset
Interaction matrix: 9915 x 17199; 98% sparsity (or ~2% density)
User features matrix: 9915 x 9930
Item features matrix: 17199 x 21007
Implementation:
lighfm.data.build_interaction()
Would appreciate any advice on this. Thank you!
The text was updated successfully, but these errors were encountered: