{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "Introduction to numpy:\n", "


\n", "\n", "


\n", "Package for scientific computing with Python\n", "


\n", "\n", "Numerical Python, or \"Numpy\" for short, is a foundational package on which many of the most common data science packages are built. Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices. \n", "\n", "The key features of numpy are:\n", "\n", "- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient. There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).\n", "- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.\n", "- Vectorization: enables numeric operations on ndarrays.\n", "- Input/Output: simplifies reading and writing of data from/to file.\n", "\n", "Additional Recommended Resources:
\n", "Numpy Documentation
\n", "Python for Data Analysis by Wes McKinney
\n", "Python Data science Handbook by Jake VanderPlas\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Getting started with ndarray

\n", "\n", "**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy. Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "How to create Rank 1 numpy arrays:\n", "

" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "import numpy as np\n", "\n", "an_array = np.array([3, 33, 333]) # Create a rank 1 array\n", "\n", "print(type(an_array)) # The type of an ndarray is: \"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test the shape of the array we just created, it should have just one dimension (Rank 1)\n", "print(an_array.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# because this is a 1-rank array, we need only one index to accesss each element\n", "print(an_array[0], an_array[1], an_array[2]) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "an_array[0] =888 # ndarrays are mutable, here we change an element of the array\n", "\n", "print(an_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "How to create a Rank 2 numpy array:

\n", "\n", "A rank 2 **ndarray** is one with two dimensions. Notice the format below of [ [row] , [row] ]. 2 dimensional arrays are great for representing matrices which are often useful in data science." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "another = np.array([[11,12,13],[21,22,23]]) # Create a rank 2 array\n", "\n", "print(another) # print the array\n", "\n", "print(\"The shape is 2 rows, 3 columns: \", another.shape) # rows x columns \n", "\n", "print(\"Accessing elements [0,0], [0,1], and [1,0] of the ndarray: \", another[0, 0], \", \",another[0, 1],\", \", another[1, 0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "There are many way to create numpy arrays:\n", "

\n", "\n", "Here we create a number of different size arrays with different shapes and different pre-filled values. numpy has a number of built in methods which help us quickly and easily create multidimensional arrays." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 0.]\n", " [ 0. 0.]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# create a 2x2 array of zeros\n", "ex1 = np.zeros((2,2)) \n", "print(ex1) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a 2x2 array filled with 9.0\n", "ex2 = np.full((2,2), 9.0) \n", "print(ex2) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a 2x2 matrix with the diagonal 1s and the others 0\n", "ex3 = np.eye(2,2)\n", "print(ex3) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an array of ones\n", "ex4 = np.ones((1,2))\n", "print(ex4) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# notice that the above ndarray (ex4) is actually rank 2, it is a 2x1 array\n", "print(ex4.shape)\n", "\n", "# which means we need to use two indexes to access an element\n", "print()\n", "print(ex4[0,1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an array of random floats between 0 and 1\n", "ex5 = np.random.random((2,2))\n", "print(ex5) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Array Indexing\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "Slice indexing:\n", "

\n", "\n", "Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[11 12 13 14]\n", " [21 22 23 24]\n", " [31 32 33 34]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Rank 2 array of shape (3, 4)\n", "an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])\n", "print(an_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use array slicing to get a subarray consisting of the first 2 rows x 2 columns." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[12 13]\n", " [22 23]]\n" ] } ], "source": [ "a_slice = an_array[:2, 1:3]\n", "print(a_slice)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you modify a slice, you actually modify the underlying array." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Before: 12\n", "After: 1000\n" ] } ], "source": [ "print(\"Before:\", an_array[0, 1]) #inspect the element at 0, 1 \n", "a_slice[0, 0] = 1000 # a_slice[0, 0] is the same piece of data as an_array[0, 1]\n", "print(\"After:\", an_array[0, 1]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Use both integer indexing & slice indexing\n", "

\n", "\n", "We can use combinations of integer indexing and slice indexing to create different shaped matrices." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a Rank 2 array of shape (3, 4)\n", "an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])\n", "print(an_array)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using both integer indexing & slicing generates an array of lower rank\n", "row_rank1 = an_array[1, :] # Rank 1 view \n", "\n", "print(row_rank1, row_rank1.shape) # notice only a single []" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Slicing alone: generates an array of the same rank as the an_array\n", "row_rank2 = an_array[1:2, :] # Rank 2 view \n", "\n", "print(row_rank2, row_rank2.shape) # Notice the [[ ]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#We can do the same thing for columns of an array:\n", "\n", "print()\n", "col_rank1 = an_array[:, 1]\n", "col_rank2 = an_array[:, 1:2]\n", "\n", "print(col_rank1, col_rank1.shape) # Rank 1\n", "print()\n", "print(col_rank2, col_rank2.shape) # Rank 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Array Indexing for changing elements:\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes it's useful to use an array of indexes to access or change elements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a new array\n", "an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])\n", "\n", "print('Original Array:')\n", "print(an_array)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create an array of indices\n", "col_indices = np.array([0, 1, 2, 0])\n", "print('\\nCol indices picked : ', col_indices)\n", "\n", "row_indices = np.arange(4)\n", "print('\\nRows indices picked : ', row_indices)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Examine the pairings of row_indices and col_indices. These are the elements we'll change next.\n", "for row,col in zip(row_indices,col_indices):\n", " print(row, \", \",col)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Select one element from each row\n", "print('Values in the array at those indices: ',an_array[row_indices, col_indices])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Change one element from each row using the indices selected\n", "an_array[row_indices, col_indices] += 100000\n", "\n", "print('\\nChanged Array:')\n", "print(an_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "Boolean Indexing\n", "\n", "

\n", "


\n", "\n", "Array Indexing for changing elements:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a 3x2 array\n", "an_array = np.array([[11,12], [21, 22], [31, 32]])\n", "print(an_array)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a filter which will be boolean values for whether each element meets this condition\n", "filter = (an_array > 15)\n", "filter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# we can now select just those elements which meet that criteria\n", "print(an_array[filter])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For short, we could have just used the approach below without the need for the separate filter array.\n", "\n", "an_array[(an_array % 2 == 0)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is particularly useful is that we can actually change elements in the array applying a similar logical filter. Let's add 100 to all the even values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "an_array[an_array % 2 == 0] +=100\n", "print(an_array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Datatypes and Array Operations\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Datatypes:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ex1 = np.array([11, 12]) # Python assigns the data type\n", "print(ex1.dtype)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ex2 = np.array([11.0, 12.0]) # Python assigns the data type\n", "print(ex2.dtype)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ex3 = np.array([11, 21], dtype=np.int64) #You can also tell Python the data type\n", "print(ex3.dtype)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# you can use this to force floats into integers (using floor function)\n", "ex4 = np.array([11.1,12.7], dtype=np.int64)\n", "print(ex4.dtype)\n", "print()\n", "print(ex4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# you can use this to force integers into floats if you anticipate\n", "# the values may change to floats later\n", "ex5 = np.array([11, 21], dtype=np.float64)\n", "print(ex5.dtype)\n", "print()\n", "print(ex5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Arithmetic Array Operations:\n", "\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([[111,112],[121,122]], dtype=np.int)\n", "y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)\n", "\n", "print(x)\n", "print()\n", "print(y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add\n", "print(x + y) # The plus sign works\n", "print()\n", "print(np.add(x, y)) # so does the numpy function \"add\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# subtract\n", "print(x - y)\n", "print()\n", "print(np.subtract(x, y))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# multiply\n", "print(x * y)\n", "print()\n", "print(np.multiply(x, y))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# divide\n", "print(x / y)\n", "print()\n", "print(np.divide(x, y))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# square root\n", "print(np.sqrt(x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# exponent (e ** x)\n", "print(np.exp(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Statistical Methods, Sorting, and

Set Operations:\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Basic Statistical Operations:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# setup a random 2 x 4 matrix\n", "arr = 10 * np.random.randn(2,5)\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute the mean for all elements\n", "print(arr.mean())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute the means by row\n", "print(arr.mean(axis = 1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute the means by column\n", "print(arr.mean(axis = 0))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sum all the elements\n", "print(arr.sum())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# compute the medians\n", "print(np.median(arr, axis = 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Sorting:\n", "

\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a 10 element array of randoms\n", "unsorted = np.random.randn(10)\n", "\n", "print(unsorted)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create copy and sort\n", "sorted = np.array(unsorted)\n", "sorted.sort()\n", "\n", "print(sorted)\n", "print()\n", "print(unsorted)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# inplace sorting\n", "unsorted.sort() \n", "\n", "print(unsorted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Finding Unique elements:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "array = np.array([1,2,1,4,2,1,4,2])\n", "\n", "print(np.unique(array))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Set Operations with np.array data type:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s1 = np.array(['desk','chair','bulb'])\n", "s2 = np.array(['lamp','bulb','chair'])\n", "print(s1, s2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print( np.intersect1d(s1, s2) ) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print( np.union1d(s1, s2) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print( np.setdiff1d(s1, s2) )# elements in s1 that are not in s2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print( np.in1d(s1, s2) )#which element of s1 is also in s2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Broadcasting:\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Introduction to broadcasting.
\n", "For more details, please see:
\n", "https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "start = np.zeros((4,3))\n", "print(start)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a rank 1 ndarray with 3 values\n", "add_rows = np.array([1, 0, 2])\n", "print(add_rows)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = start + add_rows # add to each row of 'start' using broadcasting\n", "print(y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create an ndarray which is 4 x 1 to broadcast across columns\n", "add_cols = np.array([[0,1,2,3]])\n", "add_cols = add_cols.T\n", "\n", "print(add_cols)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add to each column of 'start' using broadcasting\n", "y = start + add_cols \n", "print(y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this will just broadcast in both dimensions\n", "add_scalar = np.array([1]) \n", "print(start+add_scalar)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example from the slides:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create our 3x4 matrix\n", "arrA = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])\n", "print(arrA)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create our 4x1 array\n", "arrB = [0,1,0,2]\n", "print(arrB)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# add the two together using broadcasting\n", "print(arrA + arrB)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Speedtest: ndarrays vs lists\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First setup paramaters for the speed test. We'll be testing time to sum elements in an ndarray versus a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from numpy import arange\n", "from timeit import Timer\n", "\n", "size = 1000000\n", "timeits = 1000" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create the ndarray with values 0,1,2...,size-1\n", "nd_array = arange(size)\n", "print( type(nd_array) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# timer expects the operation as a parameter, \n", "# here we pass nd_array.sum()\n", "timer_numpy = Timer(\"nd_array.sum()\", \"from __main__ import nd_array\")\n", "\n", "print(\"Time taken by numpy ndarray: %f seconds\" % \n", " (timer_numpy.timeit(timeits)/timeits))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create the list with values 0,1,2...,size-1\n", "a_list = list(range(size))\n", "print (type(a_list) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# timer expects the operation as a parameter, here we pass sum(a_list)\n", "timer_list = Timer(\"sum(a_list)\", \"from __main__ import a_list\")\n", "\n", "print(\"Time taken by list: %f seconds\" % \n", " (timer_list.timeit(timeits)/timeits))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Read or Write to Disk:\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Binary Format:

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x = np.array([ 23.23, 24.24] )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.save('an_array', x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.load('an_array.npy')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Text Format:

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.savetxt('array.txt', X=x, delimiter=',')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!cat array.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "np.loadtxt('array.txt', delimiter=',')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Additional Common ndarray Operations\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Dot Product on Matrices and Inner Product on Vectors:\n", "\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# determine the dot product of two matrices\n", "x2d = np.array([[1,1],[1,1]])\n", "y2d = np.array([[2,2],[2,2]])\n", "\n", "print(x2d.dot(y2d))\n", "print()\n", "print(np.dot(x2d, y2d))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# determine the inner product of two vectors\n", "a1d = np.array([9 , 9 ])\n", "b1d = np.array([10, 10])\n", "\n", "print(a1d.dot(b1d))\n", "print()\n", "print(np.dot(a1d, b1d))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# dot produce on an array and vector\n", "print(x2d.dot(a1d))\n", "print()\n", "print(np.dot(x2d, a1d))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Sum:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# sum elements in the array\n", "ex1 = np.array([[11,12],[21,22]])\n", "\n", "print(np.sum(ex1)) # add all members" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(np.sum(ex1, axis=0)) # columnwise sum" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(np.sum(ex1, axis=1)) # rowwise sum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Element-wise Functions:

\n", "\n", "For example, let's compare two arrays values to get the maximum of each." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# random array\n", "x = np.random.randn(8)\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# another random array\n", "y = np.random.randn(8)\n", "y" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# returns element wise maximum between two arrays\n", "\n", "np.maximum(x, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Reshaping array:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# grab values from 0 through 19 in an array\n", "arr = np.arange(20)\n", "print(arr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# reshape to be a 4 x 5 matrix\n", "arr.reshape(4,5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Transpose:\n", "\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# transpose\n", "ex1 = np.array([[11,12],[21,22]])\n", "\n", "ex1.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Indexing using where():

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x_1 = np.array([1,2,3,4,5])\n", "\n", "y_1 = np.array([11,22,33,44,55])\n", "\n", "filter = np.array([True, False, True, False, True])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out = np.where(filter, x_1, y_1)\n", "print(out)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mat = np.random.rand(5,5)\n", "mat" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.where( mat > 0.5, 1000, -1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "\"any\" or \"all\" conditionals:

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "arr_bools = np.array([ True, False, True, True, False ])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr_bools.any()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "arr_bools.all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Random Number Generation:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Y = np.random.normal(size = (1,5))[0]\n", "print(Y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Z = np.random.randint(low=2,high=50,size=4)\n", "print(Z)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.permutation(Z) #return a new ordering of elements in Z" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.uniform(size=4) #uniform distribution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.normal(size=4) #normal distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "Merging data sets:\n", "

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K = np.random.randint(low=2,high=50,size=(2,2))\n", "print(K)\n", "\n", "print()\n", "M = np.random.randint(low=2,high=50,size=(2,2))\n", "print(M)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.vstack((K,M))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.hstack((K,M))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.concatenate([K, M], axis = 0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.concatenate([K, M.T], axis = 1)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }