Skip to content

San411/Time-Series-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Traffic Forecasting

![Web Traffic visual](/Display images/Web traffic.jpg)

This Repository is to analyse the web traffic of a website for a particular time interval using Time Series Analysis and forecast the same for the future. The website chosen for analysis here is Wikipedia. The scripts are in iPython Notebook (ipynb) format.

![DSLC](/Display images/DSLC.png) This Analysis adapts the flow of the Data Science Life Cycle.

Initial Setup

git clone https://github.com/San411/Time-Series-Analysis.git
pip3 install -r requirements.txt

Data Mining

We take the wikipedia page views for 145k articles for the year 2015-2017 on a daily basis. This will form our dataset for training. First download the dataset provided by Google from Kaggle. This should not take long. Save the data as 'data_2015-17.csv' in the 'Datasets' folder. To get the validation data use the wikipedia pageview API to request for further article pageviews for the year 2018-19 on a Daily basis. Refer Pageview API documentation. The REST API returns the JSON data which is further stored in a file to access later. Create a folder named 'json_files' in Datasets Now execute the data_mining.ipynb. The files will take long to download. The JSON data will look like this.

{"project": "en.wikipedia",
  "article": "!vote", 
  "granularity": "daily",
  "timestamp": "2018010100",
  "access": "all-access", 
  "agent": "all-agents",
  "views": 9}

Execute the 2 scripts in the following order:

data_cleaning.ipynb
data_exploration.ipynb

View Plot for a sample dataset

![View Plot](/Display images/Daily Plot.png)