Hybrid model have lower Precision@K compare to pure CF #486

kientt15vinid · 2019-08-20T03:33:14Z

Hi Maciej,

I'm testing LightFM for my recommendation system in e-commerce of grocery products (everything you could buy in a convenient store). I've tested LightFM hybrid to pure collarborative filtering (also LightFM, just without users and item features) and got smaller preciesion@10. I've read your paper and it seems to point out that hybrid model should outperform pure CF model, but my experiment get the opposite results.

Here are the descriptions of my approach:

Dataset

Interaction matrix: 9915 x 17199; 98% sparsity (or ~2% density)
- Purchase data of 9915 users across 17199 items.
- All users have at least 1 transaction during the sample period
User features matrix: 9915 x 9930
- Including an identity matrix and 15 additional features on age, gender, geographic regions, etc
Item features matrix: 17199 x 21007
- Including an identity matrix and 3808 features based on brand name, categories, product descriptions

Implementation:

Train-Test data split by timestamp (because 1 user might re-purchase an item) then interaction matrix was built using lighfm.data.build_interaction()
Training and evaluation:

model = LightFM(loss='warp',
                no_components=80,
                item_alpha= 1e-7,
                learning_rate = 0.02,
                max_sampled = 50)

Hybrid model

model_hybird = model.fit(train, 
                      item_features=item_features,
                        user_features = user_features,
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_hybrid, test, item_features = item_features,
                                    user_features = user_features, num_threads = 4, k= 10).mean()

Pure CF model

model_simple = model.fit(train, 
                epochs = 80,
                num_threads = 4)

test_precision = precision_at_k(model_simple, test, num_threads = 4, k= 10).mean()

Results: hybrid_precision@10 = 0.057814, pureCF_precision@10 = 0.070189. I've tried several things to try to increase the test precision of hybrid model including: Use weight matrix for training, optimize hyper parameter using grid search, normalize user features and item features, calculate weight for user features and item features with TFIDF. But so far the results always have pure CF outperform the hybrid model.

Would appreciate any advice on this. Thank you!

The text was updated successfully, but these errors were encountered:

FrancescoI · 2019-11-26T09:43:02Z

I got similar results to yours.

While AUC gives similar results between pure and hybrid models, precision@10 and recall@10 show much better performance on pure CF than Hybrid CF (up to +50%!).

In my case, I'm only using just a few item_features gathered from catalogue (category, item gender, item color, etc).

The strange thing is that when I'm using the item representation to calculate the item similarities, the hybrid model easily outperforms the pure CF one in this task (similarities make more sense and produce higher CTR% when deployed online).

Given the latter result, I was also expecting much higher performance on user suggestion task.

kientt15vinid · 2019-11-26T10:46:20Z

@FrancescoI may I ask what do you use to evaluate the performance of the item similarities model?

FrancescoI · 2019-11-26T11:59:35Z

@kientt15vinid, we've AB tested both the models online (a carousel of "similar items" to the main one in item page), using CTR % (click through rate: number of clicks / number of impressions) as primary KPIs.

kientt15vinid · 2019-11-27T02:15:21Z

@FrancescoI In that case, CTR might be motivated by the desire to explore different variances of an item. I suggest you could use a pure content-based model as baseline. If the hybrid model out-play both pure CF and pure Content-based, then it is something worth noting.

FrancescoI · 2019-11-27T06:36:43Z

@kientt15vinid , yeah, CB was the baseline even before switching to CF!

By switching from CB to CF we achieved tremendous uplift in CTR (up to +47%). I haven’t mentioned because I was focused on pure vs hybrid CF comparison :)

SimonCW · 2020-02-26T15:02:59Z

We get similar results in our initial experiments, adding item meta-data (and keeping the identity matrix from pure CF) leads to worse MRR and P@10 than a pure CF Model. We'll keep investigating and report back. It would also be interesting to here back from the others @FrancescoI , @kientt15vinid

maciejkula · 2020-03-03T04:07:06Z

The culprit here might be that the embeddings for all features of a given item are simply summed to get the final item embedding: the model does not seem to be great at learning which features are important and which are not.

In a more flexible formulation you may want to concatenate the embeddings of different features to get your embedding vector. This should make it more straightforward for the model to simply discard some features.

Experimenting with different weights for different types of features might give you a lever to optimize this.

FrancescoI · 2020-03-03T08:38:53Z

@maciejkula, could you expand your thought on this?

So far in my experimentations, I'm using the norm of the feature vector in the embedded space as signal that the model is actually learning something useful from the feature itself.

For instance item gender (in fashion industry) is arguably the most important item metadata, it has the greatest vector norm and it really helps separate mixed-gender items in the final product embedding.

On the contrary when a new feature vector norm is small, since its marginal contribution to the product embedding is negligible, the feature may be dropped.

Does it make sense to you?

FrancescoI · 2020-03-03T08:48:36Z

We get similar results in our initial experiments, adding item meta-data (and keeping the identity matrix from pure CF) leads to worse MRR and P@10 than a pure CF Model. We'll keep investigating and report back. It would also be interesting to here back from the others @FrancescoI , @kientt15vinid

I'm still struggling to find robust evidence since during my experimentations model performances were really sensitive to small changes in the hyperparameters and pure CF vs CF+metadata needed really different configurations to reach their highest performance.

In the end we decided to keep using the metadata: while we haven't proved it to be best model among all the configurations we tested, we found it the best model to produce reliable item similarity over time.

maciejkula · 2020-03-07T20:13:35Z

@FrancescoI the norm sounds like a very good approximation to feature importance. My hope was that this would work reliably, but I think in practice it's not always the case. It's possible that L2 regularization is really important to make sure that rare features are pushed to zero norm so as to not introduce noise.

riccardopinosio · 2020-03-24T07:44:43Z

@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers?

FrancescoI · 2020-03-25T09:52:47Z

@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers?

Every item embedding is the sum of itself plus its metadata embedding. There's a method in the lightfm object instance that automatically performs this calculation.

Than you just need to pick an appropriate distance measure to retrieve the nearest items to each other: if I'm not wrong the package documentation has a nice example also for this use case.

simongiles1 · 2021-04-05T16:38:34Z

@FrancescoI could you elaborate on how you compute item similarity from the CF + metadata model? I would be interested in doing the same, maybe you can give me some pointers?

Every item embedding is the sum of itself plus its metadata embedding. There's a method in the lightfm object instance that automatically performs this calculation.

Than you just need to pick an appropriate distance measure to retrieve the nearest items to each other: if I'm not wrong the package documentation has a nice example also for this use case.

@FrancescoI I'm trying to find the method that you're referring to where it sums up the metadata embedding. Would you be able to point me in to the function you're referring to? Would really help me out. Thanks

SimonCW mentioned this issue Sep 17, 2020

Including item features seems to reduce performance. #551

Open

dbalabka mentioned this issue Sep 28, 2020

Score decreases when features normalization enabled #413

Open

V-Sher mentioned this issue Mar 29, 2021

user_features with continuous values V-Sher/LightFm_HybridRecommenderSystem#1

Closed

BianchiGiulia mentioned this issue Jun 28, 2024

Same precision@k with different item embeddings? #714

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid model have lower Precision@K compare to pure CF #486

Hybrid model have lower Precision@K compare to pure CF #486

kientt15vinid commented Aug 20, 2019

FrancescoI commented Nov 26, 2019

kientt15vinid commented Nov 26, 2019

FrancescoI commented Nov 26, 2019

kientt15vinid commented Nov 27, 2019

FrancescoI commented Nov 27, 2019

SimonCW commented Feb 26, 2020

maciejkula commented Mar 3, 2020

FrancescoI commented Mar 3, 2020

FrancescoI commented Mar 3, 2020

maciejkula commented Mar 7, 2020

riccardopinosio commented Mar 24, 2020

FrancescoI commented Mar 25, 2020

simongiles1 commented Apr 5, 2021

Hybrid model have lower Precision@K compare to pure CF #486

Hybrid model have lower Precision@K compare to pure CF #486

Comments

kientt15vinid commented Aug 20, 2019

Dataset

Implementation:

Would appreciate any advice on this. Thank you!

FrancescoI commented Nov 26, 2019

kientt15vinid commented Nov 26, 2019

FrancescoI commented Nov 26, 2019

kientt15vinid commented Nov 27, 2019

FrancescoI commented Nov 27, 2019

SimonCW commented Feb 26, 2020

maciejkula commented Mar 3, 2020

FrancescoI commented Mar 3, 2020

FrancescoI commented Mar 3, 2020

maciejkula commented Mar 7, 2020

riccardopinosio commented Mar 24, 2020

FrancescoI commented Mar 25, 2020

simongiles1 commented Apr 5, 2021