In order to collect infomation about japanese Pornstars and analyse them, I started this project. Javhoo.com contains data that interests me, javhoo_actresses
will extract data from downloaded HTML files, and save them to sqlite DB.
- Kaggle Dataset:japanese-pornstars-and-adult-videos you can publish a new kernel!
- Add metadata of Japanese Censored,Uncensored and VR porn videos in javhoo_actresses/db/javhooDB.db.
First, you need to fetch html pages from javhoo.com/actresses using cURL. Currently, there are 212 pages about Japanse Pornstars on Javhoo.com. Therefore, you need to download 212 pages.You can paste this command to your bash shell.
curl https://www.javhoo.com/actresses/page/[1-212] > javhoo_actresses_212pages.html
OK, now you have this file javhoo_actresses_212pages.html
, then you need to modify configurations in javhoo_actresses.py
. javhooDB.db is where you restore data from extracted html pages. In the beginning, it should contains nothing.
jactress_dict ={
'html_path':'/path/to/your/javhoo_actresses_212pages.html',
'sqlite3db_path':'/path/to/your/javhooDB.db'
}
After that, you can run this python script.
python javhoo_actresses.py