Music Recommender

Data Science minor final project

Link to Shiny application zip. Includes:

app.R - application
w2v_90m - word2vec model trained on full dataset (3 files due to model size)
artistsChoice.csv - list of artists (for app's input)

Repository structure

models - word2vec models; (includes only the one trained on 10M)
notebooks - notebooks
slides - presentations
other - code snippets and other stuff
app - ~~shiny application~~ 🚧 --> https://vk.cc/9v83UO

Data

Data: song playlists from russian SNs, including ~950K users and 90M user-item pairs in total.
Example:

user_id	song	artist
1	Bohemian Rhapsody	Queen
1	The Immigrant Song	Led Zeppelin
2	LaBelle	Lady Marmalade
2	Non! Je Ne Regrette Rien	Edith Piaf
2	On était beau	Louane
2	Город	PRAVADA
...	...	...
968772	Grand Piano	Nicki Minaj
968772	thank u, next	Ariana Grande

Methods

Method: word2vec skip-gram
Idea: Each user's playlist is represented as a sentence, and if artists appear in the same playlists, they are similar and belong to the same context. The model takes artists (at least one) as an input, and recommends n artists.

The final model was trained on full dataset: approximately 90 000 000 user-item, ~ 950 000 users, time: 9 hours.

Examples

Case: something epic for the one who loves Game of Thrones

model_w2v.wv.most_similar('ramin djawadi', topn=10)

Model recommends other authors of soundtracks. Interesting case: soundtrack to 'the witcher 3 wild hunt' - 'The Trail' is quite similar to the TV series main song.

[('drake',	0.6897857189178467),
('hans zimmer',	0.774951577186584),
('ramin djawadi',	0.761010468006134),
('westworld',	0.747692465782166),
('the witcher 3 wild hunt',	0.722944676876068),
('daniel pemberton',	0.721352934837341),
('howard shore',	0.719944596290588),
('jeremy soule',	0.715254724025726),
('two steps from hell',	0.711450159549713),
('hans zimmer',	0.709886014461517),
('akihiro honda',	0.703315019607544)]

Case: the one who listens to placebo and radiohead and is not in a good mood

placebo_sim = [a[0].strip() for a in model_w2v.wv.most_similar(['placebo'], topn=15)]
placebo_sim

['iamx',
 'arctic monkeys',
 'blue october',
 'franz ferdinand',
 'radiohead',
 'the smiths',
 'the killers',
 'him',
 'the pretty reckless',
 '30 seconds to mars',
 'hypnogaja',
 'my chemical romance',
 'sea wolf',
 'she wants revenge',
 'stereophonics']

And who does not match there?
human prediction: 'my chemical romance' or 'franz ferdinand'
model:

print(model_w2v.wv.doesnt_match(placebo_sim))

my chemical romance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Recommender

Repository structure

Data

Methods

Examples

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
models		models
notebooks		notebooks
other		other
slides		slides
README.md		README.md

actuallyykatie/music_rec

Folders and files

Latest commit

History

Repository files navigation

Music Recommender

Repository structure

Data

Methods

Examples

About

Resources

Stars

Watchers

Forks

Languages