Skip to content

🎶 data science minor final project // recommending music using word2vec

Notifications You must be signed in to change notification settings

actuallyykatie/music_rec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Music Recommender

Data Science minor final project

Link to Shiny application zip. Includes:

  • app.R - application
  • w2v_90m - word2vec model trained on full dataset (3 files due to model size)
  • artistsChoice.csv - list of artists (for app's input)

Repository structure

  • models - word2vec models; (includes only the one trained on 10M)
  • notebooks - notebooks
  • slides - presentations
  • other - code snippets and other stuff
  • app - shiny application 🚧 --> https://vk.cc/9v83UO

Data

Data: song playlists from russian SNs, including ~950K users and 90M user-item pairs in total.
Example:

user_id song artist
1 Bohemian Rhapsody Queen
1 The Immigrant Song Led Zeppelin
2 LaBelle Lady Marmalade
2 Non! Je Ne Regrette Rien Edith Piaf
2 On était beau Louane
2 Город PRAVADA
... ... ...
968772 Grand Piano Nicki Minaj
968772 thank u, next Ariana Grande

Methods

Method: word2vec skip-gram
Idea: Each user's playlist is represented as a sentence, and if artists appear in the same playlists, they are similar and belong to the same context. The model takes artists (at least one) as an input, and recommends n artists.

The final model was trained on full dataset: approximately 90 000 000 user-item, ~ 950 000 users, time: 9 hours.

Examples

Case: something epic for the one who loves Game of Thrones

model_w2v.wv.most_similar('ramin djawadi', topn=10)

Model recommends other authors of soundtracks. Interesting case: soundtrack to 'the witcher 3 wild hunt' - 'The Trail' is quite similar to the TV series main song.

[('drake',	0.6897857189178467),
('hans zimmer',	0.774951577186584),
('ramin djawadi',	0.761010468006134),
('westworld',	0.747692465782166),
('the witcher 3 wild hunt',	0.722944676876068),
('daniel pemberton',	0.721352934837341),
('howard shore',	0.719944596290588),
('jeremy soule',	0.715254724025726),
('two steps from hell',	0.711450159549713),
('hans zimmer',	0.709886014461517),
('akihiro honda',	0.703315019607544)]

Case: the one who listens to placebo and radiohead and is not in a good mood

placebo_sim = [a[0].strip() for a in model_w2v.wv.most_similar(['placebo'], topn=15)]
placebo_sim
['iamx',
 'arctic monkeys',
 'blue october',
 'franz ferdinand',
 'radiohead',
 'the smiths',
 'the killers',
 'him',
 'the pretty reckless',
 '30 seconds to mars',
 'hypnogaja',
 'my chemical romance',
 'sea wolf',
 'she wants revenge',
 'stereophonics']

And who does not match there?
human prediction: 'my chemical romance' or 'franz ferdinand'
model:

print(model_w2v.wv.doesnt_match(placebo_sim))
my chemical romance

About

🎶 data science minor final project // recommending music using word2vec

Resources

Stars

Watchers

Forks