{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab Overview\n", "## Important things to note about Python/Jupyter\n", "Welcome to Jupyter. This is a _notebook_ development environment for python. If you've used mathematica before, you'll be familiar with the notebook concept.\n", "\n", "Each cell (like this one) can either be for `code` or _markdown_ (a very basic language for formatting text). This cell is a markdown cell.\n", "\n", "Clicking in the `typing` area of a code cell will let you enter edit mode on that cell. Pressing ctrl+Enter will run the code in that cell. Keyboard shortcuts can be found by clicking on help -> keyboard shortcuts.\n", "\n", "Please note that any variable created in the code can be viewed by typing `print(variable)`.\n", "\n", "\n", "The point of this exercise is to introduce you to some of the basic concepts of machine learning and data analytics, and apply knowledge of what you've learned in CE3010/NE3002/NE6012 to a real-world problem. We use python for this exercise as it is ubiquitous in the world of data science and machine learning, and is good to get some exposure to. However, you will not be graded on your understanding of python or programming in general. You should be able to gain an intuition of what the code is doing by reading through, and all the code is annotated to help you understand what's going on.\n", "\n", "## How to load the Jupyter notebook\n", "1. Open **Chrome** and go to https://notebooks.azure.com\n", "2. Sign in with your UCC IT account. If you can't remember this, but have a microsoft/hotmail account, you can sign in with this. Otherwise, you'll need to create an account. **Don't forget the password if you're creating an account**.\n", "3. Click on \"Libraries\" in the top left\n", "4. Click on \"New Library\", then click on \"From GitHub\"\n", "5. The GitHub Repo is https://github.com/lkev/ce3010_lab\n", "6. Give it the name \"CE3010 Lab FirstName LastName\" (with your actual name)\n", "7. Give it the ID ce3010-firstname-lastname\n", "8. Click Import\n", "9. If the import seems to be taking too long, refresh the page.\n", "10. In the library folder, click on \"Housing - Partially Filled\". This is the notebook we'll be using for this lab.\n", "\n", "The lab will be structured as follows:\n", "\n", "* Lab instructor will go through this jupyter notebook (Housing - Partially Filled), giving a background to machine learning and taking you through a machine learning problem with the `household_data` dataset\n", "* You will then be asked to perform a similar analysis on the `boole_data` dataset, and write up the results (details found in the Boole - Partially Filled notebook).\n", "\n", "You are allowed to copy and paste or otherwise edit some of the code here to apply it to your own datasets if you are not comfortable writing from scratch. As well as this, 99% of errors or anything else you are having trouble with can be solved by a quick google. In particular, the the [cross-validated](http://stats.stackexchange.com/) and [stack-overflow](http://stackoverflow.com/) stack exchange sites are great resources." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Outcomes\n", "Data analytics allows implicit, previously unknown, and potentially useful information to be extracted from data. It is an interdisciplinary subfield of computer science that combines artificial intelligence, statistics, machine learning and database research. As the volume of data increases, the proportion of it that people understand decreases significantly. Lying hidden in this data is information that is potentially important but has not yet been discovered or articulated. As a result, data analytics allows the useful information within this data to be successfully accessed and analysed. Data analytics and machine learning have many applications in modern engineering sectors, including:\n", "* Building & Energy Analytics\n", "* Industrial Manufacturing\n", "* Engineering Design\n", "* Predictive Maintenance\n", "* Fault Detection & Performance Monitoring\n", "* Self-driving cars\n", "* Banking\n", "* Phone typing\n", "* Image recognition\n", "\n", "**The objectives of this assignment include the following:**\n", "* To gain an understanding of the concept of data analytics and its application in buildings for energy performance analysis.\n", "* Get introduced basic machine learning principles and the importance of having a separate test and training set\n", "* Understand the effect of daylight hours, occupancy, heating degree days (HDD) and building opening hours on electrical energy consumption in buildings.\n", "* An introduction to data analysis using python, and the sklearn, numpy and pandas libraries\n", "* To investigate and analyse the energy performance of a UCC building using both correlation and regression analysis\n", "* To predict the future electrical energy performance of the building using the developed regression model\n" ] }, { "attachments": { "data_matrix.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAApUAAALDCAMAAABzWAt/AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAMAUExURQAAABERESMjIzQ0NEdHR1dXV2lpaXp6en9/fzFllDZpljlrmDttmT5vmkBvmUBvmkBwmkBwm0FxnEJynUJynkJzn0N0oEN1oUR2o0V3pEZ4pUV4pkZ5p0d6p0Z6qEd6qUd7qk57o0h7qkh8qkh8q0h9rUl+rUl+rkp/r0qAsEqAsUqAskyCtE2DtU2Etk2Ft02FuE6GuU+Huk+Hu0+Iu0+IvFWAp1eBqFCIu1CIvFCJvVCKvVGKvlKMwVONwlOOw1OPxFOOxVSQx1WRyFWSyVaTylaUzFeVzViWzliWz1iX0FmY0VmY0lqZ01qa1Fua1Vub1luc1Vuc1lud11yc1lyc11yd2Fye2Fye2V2e2l2f24UAEoYAE4cAFIgAFYkBFooCF44AFokDGIsEGIwFGY0GGo4IG48KHJUAF5ADGZAKHZEMHpINH5MPIJMQIJQQIZYTI5YTJJsaKZwbKqIjMKIkMKMlMagqNqksN60yO640PbQ7QrQ8QrU9RLY+RbhBR7hCR7pESb5JTb5KTb9KTrdlcrprd7ttecBLT8BMT8FMUMFOUMJOUcNPUsNQUsNQU8RRU4mJiZiYmKenp6qqqrW1tZu0y5+3zJ+3zZ+4zaC4zaC4zqC5z6G60KG60aK80qK806O91Ka80Km/0qq/08aDjciIksiJk+G/xOO/xcPDw8vLy9DQ0N3d3c3Z5c3a5c7b5tDb5tTf6e7a3eDg4Ozs7Obs8unu8/Hh5Pfu7/ju7/L1+PL2+PP2+fT3+fr19vv3+Pj6+/z9/f///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAI4M8WoAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAAZdEVYdFNvZnR3YXJlAHBhaW50Lm5ldCA0LjAuMTczbp9jAAAi5ElEQVR4Xu3diZtj2XmQ8Xu1VIWYzEz1JI7NYgIZMPGEgAlLDAYSSIxjG9uEzKTTdndsbA8JS8y+Q1jCYmcAq0TjYQthJ6IEYRFLbJZQ/xjf+c53Tt1PdUpSaTn3pvt9n3mqpKuSdOfUr85dVF1qromGFippeKGShhcqaXihkoYXKml4oXLfJk1qJtfO7fIk3kgHhcp9m7cG8Xwl156O9fL4abyRDgqVe/dUHabZcREun8XLdGCo3L+4DTeVs5uLdGio3L+5qhzp5ZVsz8dhU05HCJUHFPcsl+GizJstKI8VKg8oHniHQ3DZfrdXceFzmu5XS50r++/QoPKArnTwZRMevgtzW/ictjzTwYielvHnde+NByoPKZ4OWoSdyjBjPt/Fgz/bYoSRYa7sp0v9RpzJt4BzQjYYi3glqNx/64HKQ1rqN0Li8DvtTJpKuRRPTuwVKg9qqt8JDr9DcS877skcuKONyoOKpyyf88PvlI5FVClHOwdMlag8sM4py+c+HYuoUi4eck4ClQe1GqnKS7v6fKcH4XrgLZuQQ6ZKVB6WnaQ76FvwzHSjUn5W7ahnv1B5SOm3KtNZuue78AsqqlKmyv3PVYZQeUAy+m2cLTldKalKARV2aw7b00bl/oVfsJzHU5atLXqui79wqjoPfKULlXt3FV9njKcsn/NXwbV4Gv162TajA8/fonLfwovfYcMdT1lO48LnurjZWMiP6UGHOhIq92w1ttcZg06JU5aCKSRHgOd2fe9QuWc3v+Ybj3c4ZaknhEKHv/6Kyv0KFG07FXfyOWWZfpft4O03KvcsnARJu5LxtxKO8L34eV888Dt4+43K/dIjnHT2w36djV9miycsDz3+DqHy/l3FOaGNu5L29wmE5XP/Ao+qPMYooPL+xdcwhKFes118ib+bIcffRznqQyUdLTnuO855W1TSsbo6/EUdC5V0pFbjo/1SPirp0Gb6e2vhxa5j7Vmjkg5tEg68n8pR39F+RQWVdGjxFwGO+XtTqKQDs1cR2iO+uIVKOrCr8PL35PKYL22hkoYXKml4oZKGFyppeKGShhcqaXihkoYXKml4oZKGFyppeKGShhcq6e7iP5SbVP8LDKiku2ua8WQySv9url6opLuLf3thUf1PzqGS7s7+IshZ7T8515fK9E/7j/xDeNnu/j90ilVYnsujjg/8m6KpE6yg/gG5SfpVyPDHkjb+q0RTGf9hTvjzDG3Tnulfnwt/qaGdyn2vROxs1LTn8XGuzuQZJrrGy2a6kltGcTTyHcoP4+tLpe6xSJt+gXl637/dswy/f2qXt3eCVQj/9jTshx3nD1KfYAXjHz61o5f4Z7s23d9uHcd7zMXebKr/kPEqXDwbyaVFMx7Letq+Z3jziIn8NIUBiLfIlfCHh27uUH4YX38qt4/m5H4jfj1vm9m9VB59FRYqSKag+634HZ1iBUWPAQqiw8y56f5661K+Lsxmq7YN89s8TJzT/M8Z5RFbuSw/jzJBLlv1u9Ar4Ra5u0CVe9/cofwwvmdJ5WR8JY9rV7Z3ilWIyU+HXTqo06icyEZTt54rmdK2qtQm+vWXNseO5O6Trkqd6s7Dvuelbep1k59uGYWnuLlD+WF8g1DZ2dFY6Y5Z2C+xv+Yzk2k+/q9Ow/9k2FmZxH8N37lXLDziviqPtAqxyQlUHmcFg0qZvPSP+clnubpF5Vh2SMJcKE2as1koKJs1I7ufPIR9HoUvCWslTxsWplv0B+fmDuWH8Q1BZWdHI/yl97BfIt/V+aQN+1Vz+b+LI67/d7qzMolbhnyvTnuqPOYqhLnS9twO6wQrGFTKHKl/AnYsn7aq1AeJc6v9zVRJHk4m3JH+A7IblfK5TY92S+XNHcoP49v9m3jcOqPR3dFYjMPiy/hOIfF/R/7v3IjbGHfv1Wk/lUddhTBTZAaHdIIVlJsmgYJ81TJMmWIjP0eheOtZPHqzp7LCH+UPf7Iz2dPP+dFuq8x3KD+Mb/dv4nEL24aQXOzuaFi6g3zXiMftSeFeoXupPM0qyFdlBQd1ghWUmyaiNDA7Dz87O6kUv+H+afOcWo7DwyR7T8MxVNpPlGPrm1tsFdMdyg/j609lTC52dzSs+D9yx4jrldK9QvdSGZOLR10F2Wx6pftm63fMFZSbJuEwR9CMwnZcHj0/YCG7daz7yef2HDl9UnlEnZd1QrVZVfYX5dAnrYatoqR3KD+Mz9a/ep3RcDsa16uneRi3jHjKbS3TzTt0olW4au84sLx3J1hBuUm+ULaa86d6zCM35ecoZLfOdUdUpkydeudyPz0AOre5MuzByif5KPsu4UvksTuroet0c4fyw/h2/yYeNzfi+WJYxVhYtGXEO/e6aV+VnQc7bBXiibujdIIVlJvkC2UTPj1TqvYod2W3ruIPmtytnYh0mQfjqwVh71UeMZ4rV1qzcBbd/g5WWg1di5s7lB/Gt/s38bi5Eb/5Ns6acfj/j8OZRzye9V0f8eI3f1+Vx1oF+Qbe+snftxOsoNwkKmUTLoX5b4vK1ibZs3guaTENLyeGY+blWTjjFG6UR1zJlVHcg71+KvNwO9XHXMbjMZmZ5erNHcoP4xuAyu6Ohg1yGnEdVBnDsF+/0k1WHvFbuyex/VQebRVWhV33vTvBCspNYZHQiSctt6jcofxkx+wED7lTndHo7mjEgziZ48OtM/sOj8IAXsl2oTvi3Xt1ymM0jTdv6ASr4FHesQqjMAHNwwZMLoYniR+vr//+1/x7/Zw6wQqaSrlz3NdEpas7Gp0djctw2ncieyrhVvkfHoffv7kMp0hku+BGvHsvazYJ1+VDGO/tr6+cYBVkt0rP5Uz0NbryKsjdZbHcK12MH6X1rz/BCspNQWXYhOuZdFS60h6LdrOjcT2X/ebpYhZvnY+aUZgXLmWP+GylC5dxNEOde8V0uxQKA50moLs7wSqIDSvcubwKqzbs3c91jvNz5frXn2AF5SBZhcqxjv4EHP6nzNPe41HrS+Wpi9/1XrvvKgxglYfSs6pydKT32Dig+67CAFZ5KD2jKudbt98n776rsPya/2qX6FmdK+nnc6ik4YVKGl6opOGFShpeqKThhUoaXqik4YVKGl49qPxf//FnSPrvNh65/2Y39Nx/jmvzP37639Xsp/9LfFqtB5X/4Re8jd72tq/9FzYeuX9gt/Tb1/7yuDb/+oc/XbMf/hvxabUeVP7M2x7Qgwdf9y9tPHJv2S399nW/Ka7Nv/n0k5p9+m/Gp9VQ2VeodKFyEKHShcpBhEoXKgcRKl1DUXnRV/Hp+1uB+Ox3q7Qvq1589nWVj09dfJqBqLz47u/9YC994F0vhed/4d0ftgWV+/C3vRCe/7bKnxrGejmVjz/3V37stP3VP/zJIan80O/7vl766DfH7/57HtqCyj389jtU/vNhrJdX+cYX/vZpe/OPDErl937fx3rp9ySVr9uCyr3+67eo7Hm91lT++In7IipDqCyHSgmVNh45VKZQWT1UulCpobIcKiVU2njkUJlCZfVQ6UKlhspyqJRQaeORQ2UKldVDpQuVGirLoVJCpY1HDpUpVFYPlS5Uaqgsh0oJlTYeOVSmUFk9VLpQqaGyHColVNp45FCZQmX1UOlCpYbKcs+Uys77s+0WKlHZaYvK2bhpxsV3ad3YhreXvGxLN6ESlZ02qxST4e1R7/1egneqXMqjofJ2qHRtVHnZjJbX16tZftPeXbtL5bxtZqgshErXRpXb30/7ju5SORlfXaOyECpdu6tcnYd9zHm4uGymq1mr70E9GzUj3e+8aqbhSnse3qQ1qbyatk17Ft6tPRbeWxOVhVDp2qhy1nn/8qUonIjL4HTRjMfNeNQ080m41ASWcZnsNo71ij7GPLzD/3TtTdBRWQiVro0qVwIvzXSLcZjoLvVd8gVdK9emcuQic6fsfdoymTuv2rAoqly1bbjzvJnIxxwqC62rHKW3bEZl6obNKhwyd7bAwZRsoQVdmP6WitIvuz6XLbmpvNRJNAyxfwS70A2VTqWMn+07oTLVZbMQl63ai03CD3FEl4HpD3ZatggzZ7wyac5mofyDr6GyEHOla5tKESY7jjrXrZ4mYWsqnVT9nFSmujuW9nU+VHqVOVSm1tjI3mXYnJybsPuotB94l32dD5Wo7LSDSvEoKmfNWA5mvED75JY9DQfhSWVn059DZSFUunZROQ5HLTbxbVKp2+mz8Ppk/IJzf/BtobIQKl2bVC5H83BK/FL3KydNmCrnm7bg4WyRfMpfIMfoehA+Pw8fU6gshErXJpWrNvxyhnwIG+LLcEZ80ow3qBzrKfXwqxz2BWK4Db/dkV9Hn03CVfngzqtLqERlp41b8GV4kXF0FgnN5fJ0MQvb6aWeS7++buOnSVgmEFdnbX75Md6ymAbYl2HG1cJ5d00Yu1CJyk677FfuVJo/9wuVqOyESg2V5VApodLGI4fK1F680r7mfqESlZ2OpvKwUInKTqjUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKqLTxyKEyhcrqodKFSg2V5VApodLGI4fKFCqrh0oXKjVUlkOlhEobjxwqU6isHipdqNRQWQ6VEiptPHKoTKGyeqh0oVJDZTlUSqi08cihMoXK6qHShUoNleVQKaHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sZjlN7XH5UpVFZvTeWiaWbxEipTqKwec6ULlRr7leVQKaHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKFx/8vR/tpY8kla/Zgsq9tk1lz+u1pvKv/7XT9oVBqXzwDW/vqZfj879sV6v39fr0t1X+Q13e+3o5lU+efPbU/ZA+zVBUPrjoKXv6np//tsq34g19r9eaysenLj7NYFQ+392tst/WVVYKlYMIlS5UDiJUulA5iFDpGorKF/rKdusv7Gr19ARQQeWXdXnv67Wm8hOnLh7uDETlxXt/62/ppd/4TmX50rvebwsq9/5fpd/+2yp/chjr5VQ+/syfOXF/9g1lORSVH3n4ei99/JV4tvrVR7agco/et/kset/r5VW+8eaJ+9LnecVR4hXHcrziKKHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHStdpVC6bkV3aMVSistMWlbNx04wv7cruLZqy7OV5eLyZXeuESlR22qxSDE0mTXNmV3fuDpVXbTOajEqPh0pUdtqo8rIZLa+vV7OpXd+5O1QuJgv5eNY04ZMLlajstFHlpClsbHfpri14rL39sKhEZafdVa50n3AeLi6b6WomW+OnsuM5aka633nVTMOV9nwlV5LKq2nbtGcy37oK2FGJyk4bVc6a9souXi/DPqG4DKAWzXjcjGUHcT4Jl5rAMi6TndCxXtHHmAvS2bTzILFW7+BCJSo7bVS5CkcmNtMtxmFn8LJpw+WmaeXatBGXuvdpy2TulAMaWRRVrto23HneTOTjTVdNs8YUlesqR2nXG5WprPJ6JXNfdqk1jWyhBV1wtVSUftn1uWzJTeWlzYmjxm3DJ2tKQ6h0KmX8bCcHlakblTI+4rJVe7FJ+CGO6AJH/aQ/2GnZIsyc8cqkOZuF8g++Jpv19R1NVDJXurapFGGy46iMVk+TsDWVTqp+TipTnS22bOJlQ78eKr3KHCpTkVdO9i7D5uTchN1Hpf3Ad4r7nbdCJSo77aBSPIrKWTMOc1xpC+6WPQ0H4UnlLYCrtvxCESpR2WkXleNw1GIT3yaVup0+C+ziF5zfOqxZje949RKVqOy0SeVyNA+nxC91v3Kiu4PzTVvwcLZIPuUvkGN0PQifn4eP0p0oUYnKbptUyua2mUzkQ9gQX4Yz4pNmvEHlWE+pB3f2BWK4Db/dkV5Hn4XH09aPwlGJyk4bt+D6i2ejs3gEPZfL08UsbKeXei79+rqNnyZhmUBcnbX55cd4y2IaYF+GGTcks67VOSjXUInKTrvsV+5Umj/3C5Wo7IRKDZXlUCmh0sYjh8rUXrzSvuZ+oRKVnY6m8rBQicpOqNRQWQ6VEiptPHKoTKGyeqh0oVJDZTlUSqi08cihMoXK6qHShUoNleVQKaHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKqLTxyKEyhcrqodKFSg2V5VApodLGI4fKFCqrh0oXKjVUlkOlhEobjxwqU6isHipdqNRQWQ6VEiptPHKoTKGyeqh0oVJDZTlUSqi08cihMoXK6qHShUoNleVQKaHSxmPULOIFVKZQWb01lYummcVLqEyhsnrMlS5UauxXlkOlhEobjxwqU6isHipdqNRQWe5ZUZn30+8VKlHZ6dgqb85p3CtUorITc6WGynLPisr9QiUqO6FSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKqLTxyKEyhcrqodKFSg2V5VApodLGI4fKFCqrh0oXKjVUlkOlhEobjxwqUz2q/NBr399LH00qH9qCyj38DVtU9rxeayq/cOLeHJbK3/m7P9BL3/Ou+N1/9wdtQeU++GvuUPlTw1gvr/Jzf/nE/dgfGpLKBxcv9dNFfPrent9W4LbKt3R57+vlVD558slTF59mKCqf8+5U2XPrKiuFykGEShcqBxEqXagcRKh0oXIQodI1FJUvvWjHfrWLT//goqfnf3HbMXjP67Wm0o6UT1d8moGovPiu7/nuXvpdv1RdvvgrP2ALKveBb30xPP9tlf9sGOvlVD7+3F/8C6ftLw3qfOXFh15/rZc+Zq/tvPqDtqByP7jltZ2+18urfOOLJ+7v8Dp4iNfBy/E6uIRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1IamMr/LOipTqKzemspF08ziJVSmUFk95koXKjX2K8uhUkKljUcOlSlUVg+VLlRqqCyHSgmVNh45VKYOUpmPHu8VKlHZ6dgqb8603StUorITc6WGynLPisr9QiUqO6FSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKqLTxyKEyhcrqodKFSg2V5VApodLGI4fKFCqrh0oXKjVUlkOlhEobjxwqU6isHipdqNRQWQ6VEiptPHKoTKGyeqh0oVJDZTlUSqi08cihMoXK6qHShUoNleVQKaHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm088rusozKFyuqtqVw0zSxeQmUKldVjrnShUmO/shwqJVTaeORQmUJl9VDpQqWGynLPisq8n36vUInKTsdWeXNO416hEpWdmCs1VJZ7VlTuFypR2QmVGirLoVJCpY1HDpUpVFYPlS5Uaqgsh0oJlTYeOVSmUFk9VLpQqaGyHColVNp45FCZQmX1UOlCpYbKcqiUUGnjkUNlCpXVQ6ULlRoqy6FSQqWNRw6VKVRWD5UuVGqoLIdKCZU2HjlUplBZPVS6UKmhshwqJVTaeORQmUJl9VDpQqWGynKolFBp45FDZQqV1UOlC5UaKsuhUkKljUcOlSlUVg+VLlRqqCyHSuniIw9f76WPvxK/+68+sgWVe/S+zSr7Xq81lW+euC99fkgqH7z7W/rpV3/jRXj6i3e8xxZU7j2/RJ//tsp/Moz1ciqffOqPn7g/8dnHQ1L54gv99KIOvnz77Xr14vPfVvllXd77enmVTz5x6uLTDEXlc95tlW/ZLf22rrJSqBxEqHShchCh0oXKQYRK11BUvtRX8ekfXLxo1ytnR1t3qux7vdZUfvLUxacZispf9so399I3vV2H/+Ibe3r+V96pz39bpZ0Z6nu9vMpP/dET98c+M6QzQxcf/oHXeulj+Sy6Lajco+/YdhbdvrByab2cysdv/MQXT9vf/VFecZR4xbEcrzhKqLTxyKEyhcrqodKFSg2V5VApodLGI4fKFCqrh0oXKjVUlkOlhEobjxwqU6isHipdqNRQWQ6VEiptPHKoTKGyeqh0oVJDZTlUSqi08cihMoXK6qHShUoNleWeFZWjZmGX7hMqUdnp2CoXTTOzi/cJlajsxFypobLcs6Jyv1CJyk6o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKqLTxyKEyhcrqravM59dQmUJl9dZU3rwWgcoUKqvHXOlCpcZ+ZTlUSqi08cihMoXK6qHShUoNleVQKaHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXajUUFkOlRIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHShcqNVSWQ6WEShuPHCpTqKweKl2o1FBZDpUSKm08cqhMobJ6qHShUkNlOVRKqLTxyKEyhcrqodKFSg2V5VApodLGI4fKFCqrh0oXKjVUlkOlhEobjxwqU6isHipdqNRQWQ6VEiptPHKoTKGyeqh0oVJDZTlUSqi08cihMoXK6qHShUoNleVQKaHSxiOHyhQqq4dKFyo1VJZDpYRKG48cKlOorB4qXUdXmd+t9V6hEpWdjq3y5p2t7xUqUdmJuVJDZblnReV+oRKVnVCpobIcKiVU2njkUJlCZfVQ6UKlhspyqJRQaeORQ2UKldVDpQuV2tBU5rO+qEyhsnprKm9eIUNlCpXVY650oVJjv7IcKiVU2njkUJlCZfVQ6UKlhspyqJRQaeORQ2UKldVDpQuVGirLoVJCpY1HDpUpVFYPlS5Uaqgsh0oJlTYeOVSmUFk9VLpQqaGyHColVNp45FCZQmX1UOlCpYbKcqiUUGnjkUNlCpXVQ6ULlRoqy6FSQqWNRw6VKVRWD5UuVGqoLIdKCZU2HjlUplBZPVS6UKmhshwqJVTaeORQmepR5Yde+/5e+mhS+dAWVO7ht29R2fN6ran8wol7c1AqH7zjnT319fH5v8GuVu/t+vS3Vf5jXd77ejmVT37oD566T+nzDEXlRV/Fp+97BW6rfEuX975eXuWTx6cuPs1QVD7n3a2y39ZVVgqVgwiVLlQOIlS6UDmIUOkaisoX+soOdy7savX0BFBB5Zd1ee/rtabyE6cuHu4MReW3/rpf20vf9g5lefGL32sLKvfeX6HPf1vlPx3GenmVn/lTf/K0/enPKcuBqLz4yMPXe+njr8Sz1a8+sgWVe/S+zWfR+14vp/LxG2+euC99nlccJV5xLMcrjhIqbTxyqEyhsnqodKFSQ2U5VEqotPHIoTKFyuqh0oVKDZXlUCmh0sYjh8oUKquHStfRVeZ3a71XqERlp2OrvHln63uFSlR2Yq7UUFnuWVG5X6hEZSdUaqgsh0oJlTYeOVSmUFk9VLpQqaGyHColVNp45FCZQmX11lXm82uoTKGyemsqb16LQGUKldVjrnShUmO/shwqJVTaeORQmUJl9VDpQqWGynKolFBp45FDZQqV1UOlC5UaKsuhUkKljUcOlSlUVg+VLlRqqCyHSgmVNh45VKZQWT1UulCpobIcKiVU2njkUJlCZfVQ6UKlhspyqJRQaeORQ2UKldVDpQuVGirLoVJCpY1HDpUpVFYPlS5Uaqgsh0oJlTYeOVSmUFk9VLpQqaGyHColVNp45FCZQmX1UOlCpYbKcqiUUGnjkUNlCpXVQ6ULlRoqy6FSQqWNRw6VqSOpXDYju7Q9VKKy0y4qZ+OmGV/ald1bNEXey7ZpRld6sW2m+hmVqHTtoFJMTiZNc2ZXd+4OlbJYHlAvps+oRKVru8rLZrS8vl7N0rS2c5tUNjpZojKFStd2lZO93lpZultlmnlRmUKl654qV+dhH3MeLi6b6WrWNqOnsuM5aka633klO4pypT1fyZWk8mraNu2ZzLcxWSxf0oTrqEyh0rVd5axp47GJJEcqo4m4DE4XzXjcjMXXfBIuNYFlXCZT4Viv6APNBelsevMgQeU8TpaoTKHStV3lSuClmW4xDm8Dc9m04XLTtHJtKjuJMnfK3qctk7nzqg2LospV24Y7zzNAWXx+HSdLVKZQ6dqu8nolc192qTWNbKFFV5j+lorSL7s+D6d8ospLnUTD+xol2cFinCxRmUKlaweVAklctmovNgnvnBXRBVr6Sd9NKy1bhJkzXpk0Z7NQfrctVSlf3q5QmUOlayeVQkl2HHWuWz1NwtZUOqn6OalM2Y5lVCmT5QyVOVS6dlQZ9i7DIc65CbuPSpsjU1FlmCxRmUOla1eV4lFUzpqxHMx4gSWVT8NBeFLZ2fSHTKVMlvIfKmOodO2schyOWmzi26RSt9Nn4VgmfsF5lmeZSpksw0uZugiVqHRtVbkczcMp8Uvdr5w0YaqUOe5uleFskXzKXyDH6HoQPj8PH6WkUh4FlSlUuraqXLViZyIfwob4MpwRn8gkd7fKsZ5SD+fI7QtEXxt+uyO9jp5Uht/6QKWFStf2LfgyvMg4OotH0HO5PF3MwnZ6qefSr6/b+GkSlom41VmbX36MtyymAfZlmHFDV+Esul4YpUuoRKVr5/3KnUrz571DJSo7oVJDZTlUSqi08cihMrW/yrSvee9QicpOx1W5d6hEZSdUaqgsh0oJlTYe+TesUJlCZfXWVMpRo/2zFFSmUFk95koXKjX2K8uhUkKljUcOlSlUVg+VLlRqqCyHSgmVNh45VKZQWT1UulCpobIcKiVU2njkUJlCZfVQ6UKlhspyqJRQaeORQ2UKldVDpQuVGirLoVJCpY1HDpUpVFYPlS5Uaqgsh0oJlTYeOVSmUFk9VLpQqaGyHColVNp45FCZQmX1UOlCpYbKcqiUUGnjkUNlCpXVQ6ULlRoqy6FSQqWNRw6VKVRWD5UuVGqoLIdKCZU2HjlUplBZPVS6UKmhshwqJVTaeORQmUJl9VDpGpjKD//Aa730sVfid//VR7agco++Y7PKvtdrTeVPfPG0/b0fHZTK9/+O395Lv+0XXYTnf+mbvtMWVO47v0X13Vb5k8NYL6/ys3/uxP35P/B4QCofvPhCT+k3X34s7Gr14vPfVvllXd77ejmVT5584tQpysGofM67rfItu6Xf1lVWCpWDCJUuVA4iVLpQOYhQ6ULlIEKlq2+V/+kXvkwvv3zxr2w8cv/Ibum3i98c1+bf/sjvr9mP/K34tFoPKv/fV75KX/3qV37OxiP3c3ZLv33lf8e1+b8/+z9r9rP/Jz6t1oNKoi2hkoYXKml4oZKG1vX1/wcukGwwEgLV/gAAAABJRU5ErkJggg==" } }, "cell_type": "markdown", "metadata": {}, "source": [ "# Background\n", "\n", "## Introduction\n", "\n", "Statistics, machine learning and data science can all be thought of as different sides of the same coin. The fields essentially boil down to different applications or uses for statistics, probability and, to some degree, information theory. As well as this, the different fields may have different terms for the same concepts. At the heart of all of these, however, is **data**. The datasets used in this study have been cleaned and prepared for you for easy manipulation.\n", "\n", "Each column in each dataset represents a number of **features** (also known as **independent variables** or **predictors**), as well as a column representing the **response** (also known as the **output** or **dependent variable**).\n", "Each row in the dataset then represents an individual entry, comprising the features and associated response for that sample. A diagram of this can be seen below:\n", "\n", "![data_matrix.png](attachment:data_matrix.png)\n", "\n", "In machine learning, the aim is to build a **model** from existing data i.e., existing features and responses. The model identifies some relationship between the features and the responses, so that with any future data you collect, you can make a good **prediction** (also called **estimate** or **hypothesis**) for what the observed response should be.\n", "\n", "A good example is house heating: if we have data on a number of houses (the **samples**), we can look at the amount of oil each house used in a year (the **responses**). We then build a model for the relationship between the usage and things such as size, insulation rating, occupancy, etc. (the **features**).\n", "\n", "Then, when we want to guess what the usage will be for a house with no existing heating oil data, we can **predict** the amount it should use according to our model.\n", "\n", "There are a large number of different statistical models or algorithms that can be used in machine learning to make predictions. When the prediction must be of a quantitative nature (i.e. where the responses are numeric values), we use a technique known as regression. When the responses are qualitative (i.e. when the responses are certain categories or classes), we use classification. \n", "\n", "For this assignment, we will be trying to obtain a numeric prediction, so we use regression. Specifically, we will use Ordinary Least Squares Linear Regression, the most basic form of regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Exploration\n", "\n", "In this exercise we will be trying to predict the heating oil usage of homes on a particular day based on a number of factors, including their insulation rating (1-10, higher is better), the temperature on that day, and the age and size of the home.\n", "\n", "We have two datasets for houses from two different locations. Dataset 1 includes the heating oil usage for each day, whereas dataset 2 does not.\n", "\n", "We will build a regression model from dataset 1 by splitting the data into _training_ and _testing_ data. The training data is used to build the model, and the testing data is used to see if it generalizes well. We will also explore a feature extraction method to see if we can improve the performance of our model, and verify this using *K-Fold Cross Validation*.\n", "\n", "Once we determine the model is performing well, we use it to predict housing heating oil usage for dataset 2. We then compare the two datasets and see why they differ.\n", "\n", "Before diving straight into the modelling, it is always a good idea to explore the existing data and see how the features relate to one another. This is a useful sanity check for later on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set Up Libraries\n", "First, we must import the various python packages we need to use." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2018-10-02T12:05:34.928154Z", "start_time": "2018-10-02T12:05:25.940146Z" }, "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\leahy\\Miniconda3\\lib\\site-packages\\sklearn\\utils\\__init__.py:4: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working\n", " from collections import Sequence\n" ] } ], "source": [ "# display plots & graphs in browser:\n", "%matplotlib inline\n", "\n", "# Various libraries that are required\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "from sklearn.model_selection import train_test_split, KFold\n", "from sklearn.metrics import r2_score\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.preprocessing import PolynomialFeatures\n", "from sklearn.utils import shuffle\n", "\n", "# set the plot styles\n", "sns.set(style=\"ticks\", color_codes=True)\n", "\n", "def print_full(x):\n", " pd.set_option('display.max_rows', len(x))\n", " pd.set_option('display.max_seq_items', len(x))\n", " display(x)\n", " pd.reset_option('display.max_rows')\n", " pd.reset_option('display.max_seq_items')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import Data\n", "Next, we must import the Housing data from a .CSV file as the variable `house_data`.\n", "In addition, we import the second set of data that we're going to be using to predict heating usage on as `house_data_2`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import house data\n", "house_data = pd.read_csv(\"Household Dataset1.csv\")\n", "house_data_2 = pd.read_csv(\"Household Dataset2.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Displaying the data\n", "Displaying data to get a good idea of what it looks like should always be the first step of any analysis.\n", "\n", "---\n", "\n", "The following shows the columns of `house_data`. Note the features, samples and targets.\n", "\n", "**Note:** The `print_full()` function just prints the data in a better visual format than the standard print function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-10-03T10:12:37.351383Z", "start_time": "2017-10-03T10:12:37.329324Z" }, "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "print('Total number of samples in data:', )\n", "\n", "#Display first ten rows of the house_data.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen, we have 4 features, `insulation`, `temp`, `age` and `home_size`, and one target, `oil_usage`, with a total of 665 samples." ] }, { "attachments": { "correlation_strength.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiUAAACHCAYAAAG+IRmxAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAABJBSURBVHhe7d0NkuQ2joDRPtCeYi6/x5oNjBceGA2ABEkxldXfi8BIJMGfVKlYdHXb8+t//+df/yb+Gf95KPgvHkogfCi/fv36/7v/0rqoTVU5Vb8R23dl/qxeRG28KYHfHkr0NH+y6PPypgR4KIHyoUSbUMdMv5U5dnNH/XlTAjyUAA8lwEMJ8FACUw/F/4TwV2u0s59i55H7rGzrZ/GmBHgogb8fCvHPuLMJfBEeSmD6oWS7eWeXl5yZvEzUd3b+zry8KYF/PJSdr+I3yj4vb0qAhxIoH8rsJpaRfjN9V8dfNZqPNyXAQwnwUAI8lAAPJTB8KDM/QWz7jZ8kOoeuzc/p2/V+Fm9KgIcS+M9Dkf8BMvJ+8KJg6NiLsvJDr7IzjvQ9tY5P6a7/6c977EWZ8e1fvD9Z+aL8hO9M5Dpf2/JFARQvCqbwomBK+qL4n1/RzzOtk6uGF9XNqMbSuWxZ2Xtv1KaxI+sf1XdyT9gZN31RAIsXBVN4UTCFFwVTtl8UOSA9dfj6aarD685z9P1Wx6lsvyjCfkh/FVW7vWb3/qr338au234Ovfftyrb5e71GYdt2HXlR8PP940UhiFHwohDD4McOSrwkGOIlwRAvCYbSl2T0z9on/lm849Z8K/NIn6fXd2L8aIyZcZd3klMPpRpH2+SqUYnaZ/pVRv2zOWfNjC3XzpjK94vGmBl7+SXB+42++LN4STCUviSn3kK8T/drG74kMz+n8L26X990JwEULwmGeEkwxEuCIV4SDF19SeyJ2p6wOydtr+q7M66wa7SycXfn25Wtd1f4kviJViau+uiHyXJm57NjRH1snW+P8j3JGY27Ihs38lRux9WdBN+JlwRDvCQY4iXBEC8JhnhJMMRLgiFeEgzxkmBo+SUZ/WZv57d/vt8Tv0X8hG/9HFsvif3Qtqz3WVnM1glbH7V/A13z6DPYtignq7N9bJxwZCfxC7OhonubE937sG3fxq5fr/Zz2HJ0na3Tq23ftfyS4M/BS4IhXhIM/f2SEEQVvCQEQRyJvzcTAOhiMwFwBJsJgCNam0n3D49W/pDp1B9MzfCfxZe9T6zNrsneR6q2HXbc0RpW6Jh2bHsfqdpm+TluzBnReZ8Yf3bsE3O3NhMrmryzoCzX1uv9KNe3Z/mrqnVkc/k+WV7F94nG9HXC3p9UjRuto8P3i8aJ5lidryOaV2Xzd9aVjav19mrrhc/xsnpRjSNXXzeyvJlEZicVdrEqKkd5Suuz9lOqeaq1aWh5h+1vxzw1/q6T68jG+NRnreat1jq7zmp8UbWP5qjabZufQ66+buToZvJp9gEAf4q3vPdbmwnfuMB3efJ7trWZ6A5I/FlxSjQ28dk4qbWZWE8sBsCznvy+Xd5MAMBiMwFwBJsJgCPYTAAc8dhmkv2iZ/TLn6xP1S9q17pRv0jVR2VjZ/Uqaxv1idqrPmq178zYKht/dt7buZFqXF+f5YqZXK2byRVZ3WzuLY9tJhH9oNkHHrWv6M51cm4vm1OszPvUWqt1ep1c8fSaR56evzP+6TVXa5A6G0+4upmI6oNo26kPe3OuGdWcq+tY6VetQ4zarU6u6ObN5otO7mnd9XbzTo/7hOubCYCfic0EwBFsJgCOYDMBcASbCYAj2EwAHMFmAuAINhMAR7CZADjiY5uJ/E09/7f7bJ3Qso2oPqrTsHydLdurv9ey8G32Pqqz91rGs+xz9l8DpeVRaK5e9V75PH/vw7apqF7vozq9t9c3+OhmorIHI2Xfll1Fdq+idn8V2b2Qsu8T5Y+ueE70jKOvQ1Tnr77Os/U+N2oT9l5I2ffx+Vm7zfu0j20mI/YhvemBAYi9djMB8F3YTAAcwWYC4IhwMyEIgtgJNhOCILaDjYQgiO3gdyUAlrGRANjGRgJgGxsJgG1sJAC2sZEA2MZGAmDb9EYi/2auRmX13+CdHf8UP0817601CfsM9N7WZUbtq+y4p+ewn0vvbV1m1D7Dj1GNeWK+zFNjy7gaM3bX0dpIZuwuaLf/LJlH57L3b7CylqfWb8d9Yo5PfVYZQ8ex9zf5OZ9aw+64M/2XNxIty1VDy/7q6yJRW9RX67QsfM4MO4a/t1cRtWlo2V5PsXOMnJ5b2XH13l41lM+Z1RmnO3bEzufv7VVEbRpatlcR1UU6fbL2LF9Ju+1ryzNmcrc3Ek/rbfvMQlRnXFW1ZXyuHyMaqxpf2qr2VdW4tq3K2xGN769WVDcrmkP5Nt/eFY0fXa1qTmmz7VVuRvtkfUftlZU+aqbv9kYyqpdrlmuNcqr2atyMHy+7CrnX0LKV1e/ojvXEGoQdz88RzVW1ZbrjrMzh+TGyq5B7DS1bUX2Wa0XjaES0Pmuv7PYd9ZveSN5s5eEAP8Ub3v8fsZGImV0T+Ene9M7/mI0EwOcsbyT89Ae+x9Pfr+2NRI9TbCTA93j6+3Z6I7ELIf6cOCEal/h8nDS9kYgnFwLgOU9/77Y2EvXUYgA84+nv2aWNBAAsNhIA29hIAGxjIwGwjY0EwDY2EgDb2EgAbGMjAbCNjQTAtqWNZPQ35KL20d+sy9qrPiLrk40nRm0jK32rOStRn9mxVvpm7Z2xorrM7ridXPHpXKnTsKpcr5Ob6eTOOL6RaNuJhY7GOjnXrGzOJ9aw8/l2+kZmx+nMeztXylGdvSpfFidyI0/lWj6/23/k1RuJuDXX7FhVXncdT8610zcy26ea16tyfd2bc6u+UZv3VK6SXA3V6T9jeiMZLULr/LUiOVVeNpavj8aI+kR5qhpL+PZqLCU5VZ62+xwt++uMlb6SM8qbGUd057XXym6u3GtYUW7mVK6vOzVuJZvzlOmNxKoWsfpBI6Oxbs6lqrwT67Bm1xTZ6RuZHacz7ydyfd2pXO8NuZXd/t7SRgIAFhsJgG1sJAC2sZEA2MZGAmAbGwmAbWwkALaxkQDYxkYCYBsbCYBtbCQAtrGRANjGRgJgGxsJgG1sJAC2sZEA2MZGAmAbGwmAbR/ZSE79Z95O/+fiKtlcWn9zLch1vh5Rzs2vYzZX5zO8xUc2EuEfkpTtA4zK9t6HtulV75XP8/c+bJve69Xf2zphy/6KZ0XPW+5tvc/RulF4tt7f63Wlzoats/dafouPbSTCPpzqKk7UKZ8T5Y5yRnnZFc+S56yh5egqRnW+PcoTp/Ky9tH1DV61kSgp+7Yo19ZF7crX+VzbHtWJrE/3imf55xw9f19XtQl7r6o+WV97L7J+3esbXN9I5MNHD8DW6/1sWfh7WxZZe3YvojYbWq/X6t7W4RnZM7b1o3tbVrbeynKyexG12dB6vUZt9v4trm8kAH4eNhIA29hIAGxjIwGwjY0EwDY2EgDb2EgAbPttIyEIglgNNhKCILaDjYQgCIIgiFcEhxKCIAiCIF4Rvx1KAAAAnubPHxIcSgAAwHX+/CHBoQQAAFznzx8SHEoAAMB1/vwhwaEEAABc588fEhxKAADAdf78IfHaQ4n+P4696f9tDL/j67OPZ/gz/QlfV97d7yVfO41P8ecPiSOHkuhD7XzQEw9pdgyfJ2Vbd2Itn+Y/k1W1zXjq+Tw1bke0htm6jjd81o5svd/yOWa/hruf5y3PQ9aRraVqm/HUZ3xiXD+m/+xPfZYTorXtrPfEZz0xhj9/SFw7lEhZw9KybbehojoR1ds6Wx+p2qNx7FXvhZZtndBy1Ca0Pmo7Rcf2c2T1Quo0PFvv27UtqterbdNyVpfV3+DnieaO1qJ5UW5Wr7KcN9G1+TVGa9bPkuVGbULro7ZdfsxonizH14uszZaznBt0Tj93Vi+kTsOz9b5d26J6vdo2LWd1Wu/LK6q+dnzNs1e9F1q2dULLUZvQ+qhtJBtPZWNr2bbpva0TUZ2I6m2dre/y5w+JY4eSapFVu71Xvs7209D6TNWWsWOrUVlUOattJ0VzdtYxmyv3PpS9z1T5Op6NG3QeO19Up6TOR8TW632W+zbR2oVff1Uetfk4Tce0Y0d1Qso+MrZN76v8G0Zr8uuryqM2H8reZ7J8HcvGrmicUVlUOattMyTfhxq1RXy97auh9ZmqbZY/f0g89psSq/vBfF3WvztuZDS/bx/li05/KWs8xY6t91GdqsqdNitq64xVjf0knTdaW7SmbJ2+PhpPZP3fJFqj1Pn6qtxpe4LOEc3r56/WU+Vm97dF66jWVpU7bVbUNjtWNW7HaA2+fZQvOv2lrNFV9Vlp8/WzeVbVNsufPyS2DyWyMI1KlDeqy+qtUf2I5mi+72PrshwRtfk6Xxa2ztafEo3t7327yOqFrfft2mbrozpl22yOvVc+55ZovmoN2TptfRQ+542q9WV1Pt/X+bKI6k6Lxs7mq9Zj26LwOTdF8/p73y6yemHrfbu22fqoTtk2m2PvhW9foX2zsWxdliOiNl/ny8LW2fqRmT5RTlQnbL1ti+rEqH6HP39IHPlNCdZEX9DdLzIA4F3Y62P+/CHBoQQAAFznzx8SHEoAAMB1/vwh8dihRH41ZQMAAJz1zT9r/flD4tihxD8YHwAA4Kzo563G2/nzh8TWoSR6CFkAAICzop+3UbyRP39IHP/jm+hhSAAAgLOin7cS38CfPyQe/4uu3/SAAAD4Jt/8M9afPyQeP5QAAAB4/vwhwaEEAABc588fEhxKAADAdf78IcGhBAAAXOfPHxIcSgAAwHX+/CHBoQQAAFznzx8SHEoAAMB1/vwhwaEEAABc588fEhxKAADAdf78IfHYoWTnvzLn+1XjdHIjb5yrWx+ZncvqjF/pzr0zb3cuITkrc3bm6uSO3Jy321/aZ+bojNvJFdI+yhGdcb8tV0nOKK877sy8ojuuWMnJ+uzWZ3niDbmZ0fgrY97izx8Sj/6mZPVh+H6jh2515+z21/buPKI7l9WdrzOXtsm1O0+kO/fOvJ25hLavzNedS63MZa3OK7pzr8y1klP1ecMavCp3ddxRXndcbT85rrRpjHTH1eto7M64XpXbHVfbZ/NUld/JzXTX8yb+/CHx9YcSIe02Onz+qL+2d+cR3bnU03OtriuzM1537tm5dtakdsZYmU+tzrsy58pcKzlVnzeswRrlrY4rqtzZcbvzd/OtKrcz7lO51iivO662PzGuja6Z8d/Knz8kfsShRK3M15lrdV1qtX93HtGZa3VdmZ3xunOvztWdR6zOJVbmU915b84lVnKqPm9Yg5gZU3THtarc1XFHeavjiiq3M+5TudYo76k1dHKt2Txv1G913Bv8+UPi8b9TsvOgo/5R2dd16RjR2F6WOyvrn423Oo/ozJXlrsrGi8pRXkc2RjXm6nyzc9k837YiGysq++jK+kZjZbmRE7lR3yw3kuVGZR+VLC8qZ7mRLLfqe3LcLC+T5Uf9s9xIlpv1nRlTdMbNciNZblT2dSfp+E/OscOfPyQe/U0JAABAxJ8/JDiUAACA6/z5Q4JDCQAAuM6fPyQ4lAAAgOv8+UOCQwkAALjOnz8kOJQAAIDr/PlDgkMJAAC4zp8/JDiUAACA6/z5Q4JDCQAAuM6fPyQ4lAAAgOv8+UOCQwkAALjOnz8kOJQAAIDr/PlDgkMJAAC4zp8/JDiUAACA6/z5Q4JDCQAAuM6fPyQ4lAAAgOv8+UOCQwkAALjOnz8kfuSh5NevX/8IFdXd8ql5n3bimf7UZwN02O8lHyfMjHVyvjfRz7Xz2X7qs/kkf/6Q+LG/KdGXx79IN14qP+dP1/m8f9qzATqi74+nvmeeGvetOp/3T3s2n+LPHxI//lAi7AuW1Quft9ImfFnM5uh1lCtsvc+ZLWudvfp6WxZVuWoTvixGOVlZ63w78K2id9nXVeWoTUVttixsXdauV9sW5Srblt2LrKx19lrVq6pctQlfFrbOt2dlrfPt+Is/f0j8EYcSoS+Frbd1NmybqspVmxrV2Tat9+Fl9cK3jcoqqtc6H6oqV21qlNMtA99q5l32OVnZ1oksz6pyonofkazN14/KalTvQ1Xlqk3ZOt/eLeMv/vwh8cccStToRdFy1l/4flXZXm2OiuqzOs/n+ftOWUX1WZ2y7T63KttrliO6ZeBbzbzLPseWs3pRle3V5ohOXcTn6n1UX5VVp96WbbvPrcr2Gt2Lbhl/8ecPiR97KNmlL1EUAABgjz9/SHAoKXAgAQDgGf78IcGhBAAAXOfPHxIcSgAAwHX+/CHBoQQAAFznzx8SHEoAAMB1/vwhwaEEAABc588fEr8dSgiCIAiCID4RHEoIgiAIgnhB/Ovf/wctegIiwuKC1wAAAABJRU5ErkJggg==" } }, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Correlation Matrix\n", "Correlation analysis is a very quick but powerful technique that can be used to rapidly see how elements in a dataset interact and correlate with each other. This is particularly useful for the optimisation of building energy data as meaningful correlations between dataset attributes (e.g. HDD, footfall, daylight hours etc.) can be found quickly and effectively. In summary, correlation is a statistical measure of how strong the relationships are between the features and responses in a dataset.\n", "\n", "Correlation Coefficients between 0 and 1 indicate a **positive correlation**, whereas coefficients between 0 and -1 represent **negative correlation**. A positive correlation coeeficient between two variables means that as one variable rises, the other also rises. A negative correlation means that as one rises, the other falls. The closer a correlation is to 1 or -1, the stronger the correlation, whereas values close to zero indicate little to no statistical relationship between the two variables.\n", "\n", "![correlation_strength.png](attachment:correlation_strength.png)\n", "\n", "---\n", "\n", "We use the numpy (np) python package to get the correlation coefficients:\n", "\n", "```python\n", "c = np.coeff(X)\n", "```\n", "\n", "This returns a matrix, `c`, of correlation coefficients from an input matrix `X` whose rows are samples and whose columns are features. \n", "\n", "In the code below, `corr` is an array of correlation values for each column in the `house_data`.\n", "\n", "**NOTE:** `house_data` is imported with the rows and columns the wrong way around for the `np.corr` function. We need to TRANSPOSE the array by using the `.T` method" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-10-02T20:37:13.690146Z", "start_time": "2017-10-02T20:37:13.319269Z" }, "collapsed": true }, "outputs": [], "source": [ "# Create an array of correlation data. Note the .T for transposing\n", "corr = np.corrcoef()\n", "\n", "# Create the axis labels for use in the plot\n", "labels = \n", "\n", "# set the figure size\n", "plt.figure(figsize=(10, 8))\n", "\n", "# Plot a heatmap to easily visualise the relationships\n", "sns.heatmap(corr, xticklabels=labels, yticklabels=labels, annot=True,\n", " cmap='RdBu')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As can be seen, the darker the colour, the stronger the correlation (red for positive, blue for negative). From this, the following are some of the observations we can make from this graphic:\n", "* `insulation` & `oil_usage` are highly negatively correlated (-0.73) – a better insulated house uses proportionately less heating oil, and vice versa\n", "* `age` and `oil_usage` are highly positively correlated (0.87) – meaning older houses generally use more heating oil, and newer houses use less heating oil.\n", "* `home_size` and `oil_usage` are more weakly correlated correlated - the size of the home has some effect on heating oil usage, but not as much as age, outdoor temperature or insulation\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "Below, we can see various features plotted against `oil_usage`.\n", "\n", "As can be seen, the correlation signs and magnitudes match what is seen in the correlation matrix (i.e. strongly/weakly and positively/negatively correlated)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-10-02T20:37:22.824961Z", "start_time": "2017-10-02T20:37:22.367708Z" }, "collapsed": true, "scrolled": false }, "outputs": [], "source": [ "# plot age, insulation, home_size against oil_usage\n", "sns.pairplot(data=, x_vars=[],\n", " y_vars=[], kind='scatter', size=5)" ] }, { "attachments": { "over_underfitting.png": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAsEAAADWCAMAAAAHI69MAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAARFQTFRFAAAAAAAAAAD/AAAAAAD/AAAAAABVAACqAAD/AAAAAAA/AAB/AAD/AAAAAAAzAABmAACZAAD/AAAAAABVAAB/AACqAAD/AAAAAAC2AADaAAD/AAAAAAAfAAA/AABfAACfAAC/AADfAAD/AAAAAAA4AABxAACqAADGAAD/AAAAAAAzAACZAADMAADlAAD/AAAAAACLAAD/AAAAAAAVAAB/AACUAAD/AAAAAABiAAD/AAAAAAA2AABIAAB/AAC2AADaAAD/AAAAAAARAAAzAABEAAB3AACIAAC7AADuAAD/AAAAAAAPAAAfAAAvAAA/AABPAABfAABvAAB/AACPAACfAACvAAC/AADPAADfAADvAAD/4u+jTQAAAEp0Uk5TAA8PHx8vLy8vPz8/P09PT09PX19fX19vb29vf39/f39/f3+Pj4+Pj4+fn5+fn5+vr6+/v7+/v8/Pz9/f39/f39/v7+/v7+/v7+/cW04AAAAU5klEQVR42u2dbYOjyHGAQQhHJCKOSCzFwznxCTteYkdE2BYZ4iTCTuxTzcvNzu7O3tT//yH5AEiAQALRDXRT/enm5m4kqh66q+tVUWiNaHkkAloiLwN0iZ9OVUnDsi8XXImfbrUiDUu+NIBI4n0qDEegwnETbAOALbGJBIb0KgxGbSmpEQBE8t5SAaS/qJpgjX0LBmlFoAEAyH7IehCMfQsGkNVYdAFA6otq8pYa4yU43oIBVtIqV/pN2AUAf+xbsKzuCDd+OKk3YRXGYCldU7Gk7ggtfTiZ9WvDCCylyqXDaUmoZS99NondEekpOtJNOMgQLJ8pZZweTt6bjiut+uqsFWSXdJe58PRs0gbmtBG8pFdsiMj1wHMBACLJDiI7+3rKGnX0j084Qp+wGgB4lqrYYCuq5csmAz13wEiaf2fKfIZet6CClRbvVbaiKIq2Cl3J3k8IbQ1As0MAkDJ3QM0YStKdodefXj+etukRK9M+5ULk6oqiACiKoruRlP4mJ3fMjDasYctoJFqeGf8DQHLeevLlfpgAAJ5lg22647QjJCb4uFKCZfRDRBCs1ESBquWD1MU2RLB8y3O0nAK1VaAqpiPoWeM7OhE8MoLPFWj4wprDOgCEDhE8coJjD6KQXhenhU+bCJbHDIyELVcIW1xFiWB5CBY2o19vk55EBMtDsAWCVj26bQLjRLA8BKuiRtCjNv5sIlgeguNMH0e4RzBbXUGJYIkIXomZS+q2qkMggiUiWBOyYENt50MhgiUiuJVXqrdltfNjE8EyEeyImO3utXMCEsEyEawLaEbEHhSTCCaCBTUjVi292ESwVAQ74rUXCFpGEolgqQg2hcvu0dqGYYhgqQgWL7vHaevDJoLlItgVzYwI27ZDIILlIlg0M8Jo7T0hguUiWDQzwm3twSaCJSPYFar0Xm3/whHBkhFsChXUsNobPUSwZAS3S7btevnty0qIYNkIFik3QmsZUa5F8GxOBAtFsEi5ETaDhOYrBKtbxMOMCBaI4PYe1u5WyKCo5DLB5v8hIu6lJXixvptIR7AjTKWGweK4uEiwBV8QEdGSlOAPiHiYyEawJkzBp8vCZL9EsA7wCRHx3ZeUYEREvJON4Hy21+RuvRzoV1eZTJS9QLAaATy8I+ILyEzwWjqCsxm3kwMifhjmV18xmWV4gWAPAODh4+uTuINGrhC8R0RcSkdw1kd1h4iIw7SUAiY9hqoJjs3sIBC5u/IVgmf7we5PrRTonRLU1oiIOEhvks5mClM1wfG4Od0Ws4tGPV+EMhPdFVGqwEysdjlcZ5LLxmdSSbCVzLGy+s0UWa7XE44Ei79KFZiJLH9A3A9yC1YZhb8rCQ6TN8To1bv4b+38XWMlOLu9TQYakLIYZTJXEWwePR0A0BsIP0ZExG9lJXi5x82EhwJ1AVzCAaNesVUE+8e3uMekf/UPiIj4Z0kJXiAi7rgoMBx8K2GD1TTdCoL1k7M56G9urwU/ICI+m3ISvEFERC4KZONqZbAmm8N+yfEeV0lw5u/7/REcwNMPiK+3u0JGSzCbcBeDtUNEnFd9wRU3guN7on2CuZdUJ6313OthEzxHBvGyS1vQALKEq8Ke7A6JcgFk74l2bwSvjiNXJb3Jzbe7NR8FJjdxfbAEszPUywWQLf6we7sU+K1n04/Vmzacu9ymPCTI8AUrFUCu+MPoK6QRm0pRmzqUERNsD+MuN1nvtvOKzcnnJ4BcN/veCDYBACKnjREzYoI1GHDjCJZfrlQAQTYXQu9rxpMDAOC2MmJGTPBg7nKV3y3iJ4BCkn9fQbkAAMBqdQSMmWBmIQMO9mHE0DtQJoDCSJyeCI7NYK3VETBmggccl1uxLKcuE0BhLFnYz7tsJOi2eYFGTfBqsGX3TN+tEgFM3hA/ZTwdPQXl7CRNO2rhdxk1wUzPapbLYmrflAhgU8im8dv3Vbll+UnYsc0LVEbwbD4bB8FM70vMNetzFMChkDLl9PMqRwm5/pnjZbpeT28meIPiVxbVJHigDjWD7dcqEcAeEXFbOM47J1hL7d+zj59XJIvUIng23LoxdgRP5vPJcbMbnEPNZVszUSKAJSLionAj6LyxvZlK/6xQb1c/rfac4LjyccC94Gbbw3bWToGzQ9IqzByiQ00DtrXDZa/wYruZF3d9v/u9JQHXLH58K4LjPXg6WIAnh4Z1VSUK3B4P0XCAUzVcxuHuGt1XjV7OoqP5e/YC/Tci4qaNHTzgJidzLJyANyjw9Ipbw3OoqRHAx7fv1l0SrPYS0kgvcmdRbe3hDfFtcivBA/dFxAQvWynwtAfHDrVBRTVsgBdkeZuu0wG7D4K104cWPt4CeHyseSQI6A/et7ciZvtjy1x7aKON1AjgDRHx0CXBfQTlzJPpUlCC06AFi4AET7e7tje5oy8i3QgGFNVYQR8E9xGUy6SkFUpNgwbexFHH5Ljcm9quEAC+79yK6KNSzju5XAovUJPLCRHM2nfVcsVXy3/e7Tq9ySl9tE7LWC5eTgdGk1ApERxLbzAd3UP2ntk6BPfROq2y1NRu4uIkglnHcFlswUbnBPfQOi270+aPAK+JSUMEZ7svSbkF15sn1707Lbvt54NyUZO3mAge1CbMYwuuR3D37rSs5ZALyumN3iYieFCbcMjjm9Qi2O/cGZFNqdSy0K4ahbiJ4OPON4BNmM8XqUWw3XmCSM5WyBLsNfKLEMHctr6hbMH1CDYA4PE5F+qczrn278/ZCtk6o2YNUIjgAW3CnL5GvcngEXxGxJdjFyj4zHcIUN77kQlp6M3i/ETwgDZhTt+iHsHuMyIiPqQAxz/+bSeuiFPFnNJ4cHtdguf5uqV57TImMQgexCZscUq3r0ew8YqIiM8pwfGP/6vyU8tZEMM+vccr1gQv85VHS5EKkeo7k/rdhONET78nASjKrxER8TEl+CX+kdtb7X//+nTaM05lTnrDjO2aBBcqA4uFguITbAE8fHz9VebqMrnreCi6zaviqSbByj0ifjrZwV8R8RO/SPP/IOKrkbOKfUVp3gusJsGYr1tCFvMtBkWw4sPX/DPdd/yI3Lbg2gQri3WuPPIfX5/5xemWiIj4V+mPR4dw40Srmt9wm2/TvOU8bnm52y46VqDxnK9wnXdd8OpwKzq1b41URBzjdMWC4tR2aJzsWpPgyQ5xO6n6kcvTLTpW4O/zIr3rmOB47+ESUriZYJ9j3umiMM46iJ3AcfF4k6qvQXrTDkzN7JoK/BtExPdfpD9OOZdsTze73KQ8j1/J6c0E80watj8j4q9PP8cp9mbUuN5guATvulbgv7/j+8uJoeUBD/wc+pM9It5njBjgV3F6M8Emx6RhG56eHzPfawXPX/HtqXnN1yAJ3iAi3nWtQI7neNUhOs8f2JxqnW4mmGfScLFVmv74jojvzftWDJLgyQYPhXvifM5fgU6HDXyWeYItnvWmdosZFdycEWelpf+VBlRCVWCCjcKyk/VPe8T7b2zb8/MrqhNAr63A2J/VTeeaaa7zkBoHVNRBEsynHcxZy+CfJgQHqtIfwWoBQNMurAKAp2FiV9YXRMTP5b8z2CnQ7jC2PN/j/Sz3wY+/nA6NYI41+Gfb++SAiF9Du/kfOjMHTwRaBQCdAoABdLQQEfGNO8FxbLn7ynstzUNYj4bgkjZX083uw49ueRXKN6JhrTdExC/8CY69kU7XBPsA8MSv6e3tBPPrIsGu1aAYBD+9I74/8Sc4GZLa8aha85TMOB8WwTZfgn35CA4KZoqb2i8/+/bn39i2bRUMbY21AmOPWrd9E+Jr3J+GuAfzC2lYHAletSCweFMr3uTMAoGdWJyNFGh3nygcO/FWm9odc7sjmF9Ig93ufk6wfgLQKxJY9HYpIqxGClQ7v8zp6bY/X8+VgRFsCEmwdKuZAk3ouGg34G56304wv3nLHrOkISK4/DJndPftuLs/bieYX1COnZ+OCD67zEU842PlV0e+n9aWYI0IForgLnbFbnf8FgQHnxDf/prDd4qYmU5EcJVlanTz3ToY49GC4N/xKidjZ560+TOz9UJKgnXoyo6IP4m356MFwTtERJSV4CUi3k8kJLibnfG025vKYAneICK+s3/DGGYe307wpOlULHEIzpC1vOPY2cXpxnPXguDZARFfLT4E+z0THBfzrqUkOD3dJ/c8X1KjI2ulBcHKv76+fs/hHfvm89vbs9czwXE1m5x7cGJH+HdMp2IVVpxQ38GNsQ3BJp9xe/eIiD8bgh3MJZA/nfevwNjL9WdeF5nTRzjKoAlu2Aq1rnHC8PxucyGc8pleu0Xcz/pWYBzX+MhxD463+S5qmloR7PG41P7dQAjms+4wX4fekwKtNLX+wMVlOP/72NTWh06wxSM1wv+KiIeppAQzd0HeqEAXAAD+k0sf89khKfvrJI2zFcHqzcb6Yj2rvMI+fH778g+KpAR/YH1y36hANeBnp+4xbpfuKoMnODbXb5DCBTcO08k7wyN4emDs4rhVgXrELdsdERFfGxeW90Lwqhq3S7eVf0FExB+Vb8Es+7H1Q/Ddblvtb5jcsU31vlmBsSnMI3UXERE/Rh3V47UjWKsSwmx/YXq5FreA/0PZ79g2x+iF4A+dtoW8XYFxzCxinl1ovSLi1wdTEYHgOEB5/heM7zDfuy9/i3hBRHx/sCs2dXZHGyeCJ8v1hSPm0GkL+BYKjF22Nx32083uw6TSPHl+/fhgK2IQbJd6/fQoNoUqTPl0MtK5FyMO5LDrxsaH4CvR2G5bwLdQYFw1d4vTdnKo9Akm9rWrCEJw6WALNYTEmC+1Z00AeHp+LNtrXcaRSD4EX4nGbjqNR7dR4CXapuvl5LIESi2l9K1QRSG4dLiQDfAJEd8fywN27qmBQuk1jmH1KB+C15d9uiW9KYeqwDgvoAzh5UWnfCXBiZOuy2rotgSXDHjTAABeXl8fKu4JIQCAV3IHTN5fbegEL3lmxHSrQKsK4T3ihbv4FLO9Kc8B1hVxCC4xI2I4Da/CyErGcobnrmSHeZtZTje5DeJhLgXBiczPEb5izC/2uJtVAdxtX6u2BJ+bEWnH+aqoTzKd3j6LSCcJpcrwCVZm88FUb7RVoFuO8P6GRvMpwJZQAjg3I44d5+MEqDNTOJlOrxV/mdgQhggED2i1VmA5wrMD4m4iAMDtBVA0I4yTKWCWuszTlEyvUIPicgjUE8G378JNk0u1fgBmIICCGZEd+uGU+Ba0NC3eyufHmzyKUojgJgi3kn3imet+/nh7ATg569XI3caC87uZdbQ6ouwDa1yKUojgJgi3ceKafQHMQAB67vrp5dyBCZd6EXjnKLcgZ0OxDkUSwY0QDm92IqRNbS0hBRBmrCitAOK5bRAcH9TIWNAun6IUIrgZwtFtBKrp/26KKYBVxi92NvnYLdwS1Ay2J5fwilOaFBFc/8/Ey7nBktCTO1ykCyoA7XR+aGe2QOIjM3N2cpiRWqQcI0Ps32AiuPay0kh/YwxXiQkcaMIKwDtaACXD5/X89mpnfGgp+vF/wmMmBxHc3JnQ8O9p6dg8TxVXAGbqRjDKJGDnXGp+1vfmATx8/Mu3f+SWj0cEN7Fm0wF6QQOPkH0b90MTQJgwWj4A2s8+Yc43YcSz1L5yy8cjghstJ00arOsaNsLkf4gMsQUQX8QMs/xlzLp6jXxfyjCelPc9r4TSvgleH3AzEYdgxUh31MiuoQ/jOHe3LwuCmQCSodNRRWpoxqVm54N0Vjwp7yOvjOieCV4jtxlUfAhWVA+ODF+5mJlHfqNVn0JmIwAb4OELvr9W+LTdYx9Ov7BL//YdEfGRV0p/zwTvOkgkthlboGZ4LEBwq51D2ir9z57ecL8Qn2A1jAe0v5QHJRKXmp2YwRmbSf2PH/D9hdshxI7g2Xx6I8F7sQhW1MzY09AxS/HNDE3/ARFxKjzBipkMaK9waif+sthSziUFqyvP5ncLYEbw5qbSt0UHPYht9l4Azc1NMrWtk4I0w/bCzC/DX/beZ5mVAJx3RMQv1sW7XuR3PI+PFcELvG2nWWx3S0UUgifzeTnD6UjeoPjvQivpNHonAcHKbxARf1L5axcA4On15aHT7A9WBK+x0y4mvShwdkDcH19SzYmujpm2UkvpMJVBAMryUq8lRQ0AXhDx6wOo4hG8xL6tPf4K3BYdJ5Z3Ad/QSVwVk/VuM5NCAFeNqwjeL7VBGbYdvL1+VM62u/VEYAWW1Haqllu6E/u2LuErfHUljXzedBEJVuZ3Vzaa6aHLTj0cFHhf7rzWLDtjAEe+uzKUQa3uCFb07xARf6UISXBNU3kqrgJnh0sGrWYYhq6sd7vBjYnskOC4/lWRmuCZwAqcLBZXrKAPiIiLEROsTOYda7hDguesB2QMUIHdduUcIsGdry6jyot73E7kVmC3XTmJYIWyKxmvzRCnRBLBRHB9K3Bz2K8VIpgIJgWSAIhgIpgIJgWSAIhgUiAJgAgmBZIAiGAimAgmBZIAiGBSIAmACCYFkgCIYCKYCCYFkgCIYFIgCYAIJoKJYFIgCYAIJgWSAIhgUiAJgAgmgolgUiAJgAgmBZIAiGBSIAmACCaCiWBSIAmACCYFkgCIYFIgCYAIJoKJYFIgCYAIJgWSAIhgUiAJgAgmgolgUiAJgAgmBZIA6izLIIKJYJGXVv0IEhCsEcHyCyAAX5OW4EAngqUXgAUQrWQl+JqCZDhER0+wCgAQGJJaEVB5wsiiQNNVR06wEo+w9jQpb3IBQGTLrcAo0EdOsJVM/XU1CQm2AABCU2YFOtDiIeS4CBwnsLu6dASryTx5Q14F6lVWYD2CXUP85R+Hr4NvSUZwYiNVMSzFFhSWnqB1CZZuRZIRbJ7eTlNSglex4uxbbnSWL8MKMwB7ZpZgGZ4u83DhSpWRYC3dfGxVGec6Ehw5Wt4Olm4FqowXmeCoQFcbI8B61StsyLDMjIF0rl8bQokP0bEsR+ojaAWXzlgZLzKr0REcAQC4sppQyQkbWuUmpBTnjJcx9a3x2cImALSJ6ohwy5H8jnO0InxDGeFypT53HAAAR+59SR81v4oaybsBx9uT1M8X70Ej5ldRLFfmDcqAESQARxdTP6RfcjsQ3VCXXoMmAIw2liG/kT8CzXry20m0pL7IgENCoCXyRcYiGeTX/wMciGAh9KpqDAAAAABJRU5ErkJggg==" } }, "cell_type": "markdown", "metadata": {}, "source": [ "# Building a Machine Learning Model\n", "\n", "## Linear Regression\n", "Linear Regression is a simple but powerful technique that is used for numerical prediction. It is a statistical measure that attempts to identify the strength of correlations between one dependent variable (the response) and a series of other changing instances known as independent variables (the features).\n", "\n", "Linear regression models are simple models and often provide an adequate and interpretable description of how the inputs affect the outputs. For prediction purposes they can sometimes outperform more elaborate nonlinear models, especially in situations with small numbers of training samples, low signal-to-noise ratio or sparse data.\n", "\n", "The equation of a linear regression model is simply the sum of the features for a particular sample, except that weights are applied to each feature before summing them together:\n", "\n", "