- Assigned: Sep 9
- Due: 9/14 10AM
The goal of this assignment is for you to set up Microsoft Azure.
Many of the assignments in this class will use Microsoft's cloud computing infrastructure. Using a cloud service like Microsoft (or Amazon, etc) makes it easy to share data sets, and quickly run any number of virtual machines that are identical for all students in the class. We have credits from Azure, which we will use for this class (in this homework, we will use a free "micro" instance.)
Caution: allocating and running services (e.g., virtual machines) on any cloud provider costs money or credits. It can be very easy to spend more than you anticipated by leaving your services running. Make it a habit to stop your services when not in use. If you run out of credits, there's not much we (the staff) can do.
Signup
You may be asked to submit your credit card information for screening purposes, however you will not be charged as the service will stop your account if your trial credits are used up. Once the class registration has settled down, we can provide you with information to use the class's Azure credits.
Launch an instance
- Go to https://manage.windowsazure.com/
- Click "NEW" in the bottom left of the screen
- Click "Compute", then "Virtual Machine", then "Quick Create"
- In DNS Name, provide a name for your new machine/website
- In Image, pick Ubuntu Server 14.04 LTS
- In Size, pick D1 (1 core, 3.5 GB Memory)
- Pick a username and password and remember them
- In Region, you can optionally pick East US or East US 2. It will allocate your machine in a data center on the East Coast, so latency to the VM should be faster.
- Click "Create Virtual Machine" and wait a couple minutes for it to launch.
- Click on "Virtual Machines" in the main left panel to see your VMs
- Once the VM has started, click on it, then click on the "Dashboard" tab
- You should see your machine's DNS Name as .cloudapp.net. this is your public address
SSH to Your Instance
Using a terminal program (e.g, MacOS Terminal, or an xterm on Athena, or a Cygwin terminal under windows), type:
ssh <username>@<vm name>.cloudapp.net
It will ask for your password, once you enter you should see something like:
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.19.0-25-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Wed Aug 19 04:28:23 UTC 2015
System load: 0.27 Processes: 107
Usage of /: 4.1% of 28.80GB Users logged in: 0
Memory usage: 4% IP address for eth0: 10.0.0.4
Swap usage: 0%
Graph this data and manage this system at:
https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
https://www.ubuntu.com/business/services/cloud
Last login: Wed Aug 19 04:28:23 2015 from columbia.edu
eugenewu@ewutest:~$
Setup the OS
Ensure the following packages are available using the Ubuntu package management tool apt-get.
To install a package, type:
sudo apt-get install <packagename1 packagename2 ...>
Make sure you have the following packages:
- python2.7
- python-pip
- postgresql-9.3
- postgresql-client-9.3
- postgresql-server-dev-9.3
- libpq-dev
- python-dev
- sqlite3
- git
Setup Python:
Python uses its own package manager to install/update/remove packages. In general, the following installs python packages:
pip install <packagename>
Typically the package manager will require sudo
and install the packages in a global folder that affects everyone using your machine. This is bad hygiene because different python applications may use different versions of packages and it's easy to step on each other's toes.
We will use virtualenv
to create virtual environments that contain their own copies of python
and packages. When we work in a virtual environment, pip will install packages local to the environment rather than globbaly. You can read a detailed tutorial.
Lets setup your environment
-
Install
virtualenv
and convenience libraries invirtualenvwrapper
(this is the one time you should install globally)sudo pip install virtualenv virtualenvwrapper
-
add the wrapper commands (you may add this line in ~/.bashrc so it runs when you create a bash shell)
source /usr/local/bin/virtualenvwrapper.sh
-
create a new environment (will create a folder
test/
in~/.virtualenvs/
)mkvirtualenv test
-
switch (activate) an environment by using
workon
workon test
-
switch out of an environment:
deactivate
Now let's install a set of useful packages into your environment:
- Activate your environment (see above)
- install the following packages using
pip
(see above)- flask
- psycopg2
- sqlalchemy
- click
- Deactivate and you're done
Sign up for GitHub
GitHub is a source control repository website that many in industry use to manage their projects.
In fact, this class is organized as repositories under the organization 4111: https://www.github.com/w4111.
Each assignment is managed as a separate repository.
As the course progresses, more repositories will be made available.
Checkout the homework 0 repository
To work on an assignment, it's a good idea to "fork" the repository into your own GitHub account and complete the assignment there. Let's try this with the homework 0 repository at https://github.com/w4111/hw0
-
To start, fork the repository
-
Clone the repository to your Azure VM
-
Once you modify your files, commit the changes.
-
If you want to share it with a teammate, or just want to make sure GitHub has a copy of your files in case the VM crashes, Push the changes to GitHub.
-
If your teammate pushed to the same repository on GitHub, pull the changes from GitHub by typing within the repository:
git pull
Let's make sure you have access to Python, sqlite3, and the git repository.
Python
Type python
and ensure that you see the following like (the Python version may be slightly different 2.7.X):
Python 2.7.4 (default, Apr 19 2013, 18:28:01)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
Then try importing some modules from the packages we installed
>>> import flask
>>> import psycopg2
>>> import sqlalchemy
>>> import click
If that worked, push ctrl+d
to exit the prompt.
sqlite3
SQLite is an "embedded" SQL database (it doesn't depend on a dedicated server process; instead the client just manipulated a stored datbase file directly.)
To ensure it is installed, type sqlite3
and verify that you see the following:
SQLite version 3.7.15.2 2013-01-09 11:53:05
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>
If you do, push ctrl+d
to exit the prompt.
PostgreSQL
We will use the PostgreSQL DBMS in this class. Check that the client program works:
psql --version
If it prints something like the following then it works
psql (PostgreSQL) 9.3.9
git repository
Type cat hw0/README.md
You should see the instructions for this hw fly by.
To complete this homework, go into your hw0/
directory. There should be a file called
iowa-liquor-sample.csv
. The state of Iowa released
a dataset containing all sales transactions at alcoholic beverage stores during 2014. We will use
this dataset for many assignments in this course. Since it contains over 3 million records, this is
a small sample.
Disclaimer: this course does not condone drinking, we are using this dataset because it is a common format for a sales transaction log in a silghtly more accessible domain than typical bank transactions
Write a python script that reads the file and computes the number of records (in this file, each line is a record) that contain the exact case insensitive phrase "single malt scotch". Ignore upper and lower casing, so "Single Malt Scotch", and "SINGLE Malt Scotch" all match, whereas "Single's Malty Scootch" does not.
Submit your assignment at https://goo.gl/forms/YzQ1DwHzMy
Note that you must be logged in to Columbia's lionmail to submit, and we will only consider your first submission.
Whew, you're almost done! Go read the assigned readings.
You can always send us questions on Piazza!