Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid model have lower Precision@K compare to pure CF #486

Open
kientt15vinid opened this issue Aug 20, 2019 · 13 comments
Open

Hybrid model have lower Precision@K compare to pure CF #486

kientt15vinid opened this issue Aug 20, 2019 · 13 comments

Comments

@kientt15vinid
Copy link

Hi Maciej,

I'm testing LightFM for my recommendation system in e-commerce of grocery products (everything you could buy in a convenient store). I've tested LightFM hybrid to pure collarborative filtering (also LightFM, just without users and item features) and got smaller preciesion@10. I've read your paper and it seems to point out that hybrid model should outperform pure CF model, but my experiment get the opposite results.

Here are the descriptions of my approach:

Dataset

  • Interaction matrix: 9915 x 17199; 98% sparsity (or ~2% density)

    • Purchase data of 9915 users across 17199 items.
    • All users have at least 1 transaction during the sample period
  • User features matrix: 9915 x 9930

    • Including an identity matrix and 15 additional features on age, gender, geographic regions, etc
  • Item features matrix: 17199 x 21007

    • Including an identity matrix and 3808 features based on brand name, categories, product descriptions

Implementation:

  1. Train-Test data split by timestamp (because 1 user might re-purchase an item) then interaction matrix was built using lighfm.data.build_interaction()
  2. Training and evaluation:
model = LightFM(loss='warp',
                no_components=80,
                item_alpha= 1e-7,
                learning_rate = 0.02,
                max_sampled = 50)
  • Hybrid model
model_hybird = model.fit(train, 
                      item_features=item_features,
                        user_features = user_features,
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_hybrid, test, item_features = item_features,
                                    user_features = user_features, num_threads = 4, k= 10).mean()
  • Pure CF model
model_simple = model.fit(train, 
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_simple, test, num_threads = 4, k= 10).mean()
  1. Results: hybrid_precision@10 = 0.057814, pureCF_precision@10 = 0.070189. I've tried several things to try to increase the test precision of hybrid model including: Use weight matrix for training, optimize hyper parameter using grid search, normalize user features and item features, calculate weight for user features and item features with TFIDF. But so far the results always have pure CF outperform the hybrid model.

Would appreciate any advice on this. Thank you!

@FrancescoI
Copy link

I got similar results to yours.

While AUC gives similar results between pure and hybrid models, precision@10 and recall@10 show much better performance on pure CF than Hybrid CF (up to +50%!).

In my case, I'm only using just a few item_features gathered from catalogue (category, item gender, item color, etc).

The strange thing is that when I'm using the item representation to calculate the item similarities, the hybrid model easily outperforms the pure CF one in this task (similarities make more sense and produce higher CTR% when deployed online).

Given the latter result, I was also expecting much higher performance on user suggestion task.

@kientt15vinid
Copy link
Author

@FrancescoI may I ask what do you use to evaluate the performance of the item similarities model?

@FrancescoI
Copy link

@kientt15vinid, we've AB tested both the models online (a carousel of "similar items" to the main one in item page), using CTR % (click through rate: number of clicks / number of impressions) as primary KPIs.

@kientt15vinid
Copy link
Author

@FrancescoI In that case, CTR might be motivated by the desire to explore different variances of an item. I suggest you could use a pure content-based model as baseline. If the hybrid model out-play both pure CF and pure Content-based, then it is something worth noting.

@FrancescoI
Copy link

@kientt15vinid , yeah, CB was the baseline even before switching to CF!

By switching from CB to CF we achieved tremendous uplift in CTR (up to +47%). I haven’t mentioned because I was focused on pure vs hybrid CF comparison :)

@SimonCW
Copy link
Collaborator

SimonCW commented Feb 26, 2020

We get similar results in our initial experiments, adding item meta-data (and keeping the identity matrix from pure CF) leads to worse MRR and P@10 than a pure CF Model. We'll keep investigating and report back. It would also be interesting to here back from the others @FrancescoI , @kientt15vinid

@maciejkula
Copy link
Collaborator

The culprit here might be that the embeddings for all features of a given item are simply summed to get the final item embedding: the model does not seem to be great at learning which features are important and which are not.

In a more flexible formulation you may want to concatenate the embeddings of different features to get your embedding vector. This should make it more straightforward for the model to simply discard some features.

Experimenting with different weights for different types of features might give you a lever to optimize this.

@FrancescoI
Copy link

@maciejkula, could you expand your thought on this?

So far in my experimentations, I'm using the norm of the feature vector in the embedded space as signal that the model is actually learning something useful from the feature itself.

For instance item gender (in fashion industry) is arguably the most important item metadata, it has the greatest vector norm and it really helps separate mixed-gender items in the final product embedding.

On the contrary when a new feature vector norm is small, since its marginal contribution to the product embedding is negligible, the feature may be dropped.

Does it make sense to you?

@FrancescoI
Copy link

We get similar results in our initial experiments, adding item meta-data (and keeping the identity matrix from pure CF) leads to worse MRR and P@10 than a pure CF Model. We'll keep investigating and report back. It would also be interesting to here back from the others @FrancescoI , @kientt15vinid

I'm still struggling to find robust evidence since during my experimentations model performances were really sensitive to small changes in the hyperparameters and pure CF vs CF+metadata needed really different configurations to reach their highest performance.

In the end we decided to keep using the metadata: while we haven't proved it to be best model among all the configurations we tested, we found it the best model to produce reliable item similarity over time.

@maciejkula
Copy link
Collaborator

@FrancescoI the norm sounds like a very good approximation to feature importance. My hope was that this would work reliably, but I think in practice it's not always the case. It's possible that L2 regularization is really important to make sure that rare features are pushed to zero norm so as to not introduce noise.

@riccardopinosio
Copy link

@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers?

@FrancescoI
Copy link

@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers?

Every item embedding is the sum of itself plus its metadata embedding. There's a method in the lightfm object instance that automatically performs this calculation.

Than you just need to pick an appropriate distance measure to retrieve the nearest items to each other: if I'm not wrong the package documentation has a nice example also for this use case.

@simongiles1
Copy link

@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers?

Every item embedding is the sum of itself plus its metadata embedding. There's a method in the lightfm object instance that automatically performs this calculation.

Than you just need to pick an appropriate distance measure to retrieve the nearest items to each other: if I'm not wrong the package documentation has a nice example also for this use case.

@FrancescoI I'm trying to find the method that you're referring to where it sums up the metadata embedding. Would you be able to point me in to the function you're referring to? Would really help me out. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants