Python is a general purpose programming language that is easy to learn.
Pandas is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly.
The goal of this cookbook is to give you some concrete examples for getting started with pandas.
I'm working with these datasets right now
- 311 calls in New York
- Movie data
It comes with batteries (data) included, so you can try out all the examples right away.
You'll need an up-to-date version of IPython Notebook (>= 3.0) and pandas (>=0.13) for this to work properly. It's set up to work with Python 2.7.
You need to install all the packages for data analysis. You can install all these using the Anaconda distribution unless you really know what you are doing.
Installation page: Anaconda
- Download the version that suits your needs at Anaconda. I use 2.7
- Once it's downloaded double click on the package and wait until it installs.
- Close and reopen all Terminal windows
- To make sure it installed correctly:
- open a terminal
- type "python"
- type "import pandas"
- you should see ** no errors **
- Done !
Once you have pandas and IPython, you can get going!
I would highly recommend using git. Once git is installed, you can clone the material in this tutorial by using the git address shown below:
git clone https://github.com/GusSand/itp_talk_2016
Once you have cloned the git repo, you can change to the cookbook directory and run ipython using the following commands:
cd pandas-cookbook/cookbook
ipython notebook
A tab should open up in your browser at https://localhost:8888
- Chapter 0: Getting Ready
Making sure you have the prereqs - Chapter 1: Quick tour of the IPython Notebook
Shows off IPython's awesome features like tab completion and magic functions. - Chapter 2: 20ish minutes to Python
Shows off IPython's awesome tab completion and magic functions. - Chapter 3: Analyzing NYC 311 Open Data
Reading data from a CSV, finding most common complaints and some plots - Chapter 4: Analyzing MovieLens 1MM Dataset Data
Analyzing the MovieLens 1 Million dataset to find data about movies with multiple tables.
If you see something wrong, or there's something you'd like to learn that I haven't explained here, or there's something you know about that you would like to share, create an issue! Send me email! Send a pull request!
- Add more about matplotlib
Some of this content adapted from: Pandas-Cookbook
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License